>> Now, I want to introduce
our first keynote speaker
who essentially is
the mastermind of Azure.
Azure is the way we draw
boxes around of these things.
So, Azure is of course
our public cloud,
but Azure or Azure Stack is also
our private cloud and the two
together form the hybrid cloud,
and Azure results of our IoT
or Edge story with
the smaller devices.
So, the mastermind behind
that is Mark Russinovich,
who is the CTO of Azure.
He got his PhD from
Carnegie Mellon University,
and since then, has had
a very fruitful career with
many, many activities.
So, he's really passionate
about distributed systems,
about security, about
operating systems.
He's been working on
Windows and Azure
obviously for a long time and
he has created many tools,
so he's the author of the system,
integration, administration
tools for Windows.
He had several stints
among others at IBM.
But in 1996, he started
a company called Winternals,
which was eventually acquired
by Microsoft in 2006.
Since 2006, he has been
with Microsoft and
has essentially help create
Azure and holding Azure together.
One of the very interesting
facts about Mark is that
he's a very passionate speaker
but also a very
passionate author,
and he wrote books,
not only technical books,
but also fictional books and
fictional technique
called fictional books.
I haven't read them,
but for now I have three books,
but I've actually looked at
it and I'm going to read
one of them and you
can look them up.
So with that, please welcome
Mark Russinovich.
Thank you very much.
>> So, I'm really
disappointed to hear
he hasn't read them.
Well, anyway, thank
you for having me.
It's a pleasure to be here.
I want to thank I'm NSR
for inviting me to kick
this off Faculty Research Summit.
I'm the Chief Technology
Officer of Azure.
I've been in Azure for
eight years, and in that role,
I work on the
technical strategy and
architecture for the platform.
Now, this morning I've got
about an hour with you
and the topic of this session
is Azure Architecture.
Azure architecture has grown
over the last 10 years since
Azure was really
started to become
a very broad and
complicated platform.
So, 60 minutes is certainly not
enough time to do it justice.
I can give you
a superficial look at
the architecture
broad brush strokes,
but I thought it in the process,
I'd also share with you insights
into some of the
collaboration that
we've had going with
Microsoft Research.
I think Microsoft
Research in my view
is competitive advantage
we have here at Microsoft.
Having this research body where
people are off exploring,
sometimes what we think are
crazy ideas but then
proved to be interesting.
We actually have incorporated
many of these ideas.
We partnered very
deeply with them
across a number of
different projects.
So, during this
architecture overview,
I'm going to weave in
some of those projects
and show how they've
impacted Azure.
So, Microsoft Data Centers.
Let me talk a little bit about
the overall data center strategy,
which is to divide
the world into geographies.
We divide the world
into geographies
largely based on data
sovereignty boundaries.
More and more we see
customers wanting
Azure Data Centers within
their country because of
the regulatory requirements,
data sovereignty requirements of
businesses to have the data
within the country boundaries.
So, this is why we're now at
54 regions worldwide across
a number of
different geographies,
and when we go into one
of these geographies,
we go in with
at least two regions.
These regions are defined
by latency envelope of
less than two milliseconds.
So, going from any data center to
any other data center within
what we call a region,
what we expose to customers,
is less than two milliseconds.
The region pairs, we call them,
are two regions that are
designed for customers
to be able to build
a disaster recovery solution
on top of any of our
services that support
geo data replication are required
to support replication
between the geo pairs,
although they can replicate
to other regions as well.
The distance between them is
typically several 100 miles
because we want to place
them far enough apart
to survive large-scale disasters.
So, large hurricanes,
large earthquakes, floods.
But within the region,
we actually divide the region
into availability zones.
Availability zones give
customers the ability to create
a highly available solution on
top of data centers
that can fail.
So, data centers we see
can have power outages,
they can have
local floods and fires,
they can also be impacted by
other environmental
conditions in the area.
So, we typically place the
data centers that make up
these availability zones several
kilometers apart at least.
The requirement there
again is that they've got
the latency envelope between
the data centers
that the region has,
and less than two milliseconds
enables customers to
build a solution that
synchronously replicates data
across the data centers.
Such that, a service that
uses quorum to
commit durable data,
like our storage services
or our database service,
can replicate the data across
three availability zones,
and that's our minimum
that we go into a region
with when we light up
with availability zones.
Then, they can tolerate
the loss of a full
availability zone or
the data centers
associated with it
and still continue to
serve right operations.
They can actually tolerate
too and still continue
to serve read data
off of their service.
Now, this is
the high-level strategy.
I thought it's always
fun to take a look at
actual pictures of data centers,
and so I've got some
to share with you
just to show you the kind of
scale that we're getting to.
The amount of capital investment
now Microsoft is putting
into our data center build out
around the world is on
the order of $10 billion,
and this is likely to grow.
Microsoft Azure has doubled
over the last few years,
and this is a statement
that we make
publicly about the
compute usage doubling.
You can talk if you can see
just our later latest
quarterly earnings call.
We announced 89 percent
year-over-year growth
of Azure revenue,
and that corresponds to the build
out that we've got going
with these data centers.
This one is in
Quincy, Washington.
Here's another look at that
from a different angle
where you can see
the eight data centers
that I just showed you in
the picture there towards
their receding horizon,
and then you can see
another data center
here in the foreground.
Quincy is a great place
because it's
served by renewable energy
from the Columbia River,
and there's plenty of
land of course there.
We actually offer
data center tours there.
So, sometime if you
come to campus,
it's about a three-hour
drive out there,
but then you can go and take
a look and walk through
one of our data centers.
Here's Cheyenne, Wyoming.
This is another area where
we've got plenty of land, energy,
water to expand, and you
can see that we've got
massive expansion
going on in that area.
A land being cleared for
additional data center there.
This is Dublin, Ireland.
Another place where we've
got massive data centers.
There's actually six
in this picture.
There's three there on
the left and three there on
the right building out over time,
and we'll continue to build
new ones there as well.
Amsterdam was another site.
Amsterdam is actually has
plenty of great water,
very reliable energy,
and you can see here,
we've cleared land
for data center.
There's a data center
in front of us.
That building off to the left
looks like it might
be date center,
it's actually not a data center.
Anybody know what that is?
It's a greenhouse, and that means
it can be growing
one of two crops,
both of them legal in
the state of Washington.
Then, here's another look
at Amsterdam,
that building that we
were looking at is off
there on the bottom right.
So, you can see that
this is taken slightly after
that previous picture,
so those data centers had
been built out and there's
more land being cleared
for new data centers.
Now, one of the
projects that we're
looking at with
Microsoft Research,
and this is one Don
was talking about,
Microsoft Research's goal
is to surprise us,
and this one surprised
me when a few years ago,
I heard about this
Project Natick.
Project Natick was MSR
looking and saying,
"Hey, can we go build data
centers on the ocean floor?"
That just sounded crazy to me.
The idea here is we can drop
tanks onto the ocean floor.
We can cool them
with the the ambient
water cooling that comes
with being on the ocean floor
and the currents.
Actually, the cooling is
very efficient because it's
already cold down there.
When you submerge them,
the other idea is that you
can then have them operate
for longer periods of time,
you can expand them very
easily because you just
drop additional tanks
in the water.
So, the first Natick project
was announced publicly
a few years ago.
There was an article actually in
the New York Times about it,
where we dropped just
a single rack in
a small cylinder into
about 30 feet of water,
and we let it sit
there for a month,
and then they pull it up,
and it seemed like things
were still working.
So, we said "Cool",
onto the next phase.
This next phase is
one that we just
announced about a month ago.
Look at the cylinder here,
it's about a little over
12 meters in length,
about three meters in diameter,
it has 12 racks that
are 42U height in them,
and in this build out,
we've got about a
thousand servers
there, decommissioned
Azure servers.
Of course, the storage has been
pulled out and shredded
on the way out of
the data center,
and so there's new
hard disks and SSDs here,
but they are
the actual same servers
that we've been operating
in our Azure Data Centers.
The tubes are built
actually at a facility in
France that builds submarines,
and they're very good at
building these underwater
technologies.
It takes 30 days.
It took 30 days from
the finalization
of construction at the plant in
France to this being
online at the bottom
of the ocean floor.
So, that shows you the kind
of agility that we
might get out of this.
Expanding capacity
when you're doubling
year-over-year is
a tough problem.
If we can just simply
expand on-demand
wherever we need to by
simply shipping new tank
out and having it
deployed within 30 days,
that will let us meet
that demand very efficiently.
This particular experiment has
been done off the Orkney Islands
in Northern Scotland.
The tank has been shipped
out to about a kilometer
offshore and then submerged
into about 100 feet,
36 meters of water,
and then you can
see some pictures
here of it being dropped.
Now, one of the benefits of this.
So, what kind of constraints and
interesting problems that
we're looking at here,
as part of the research is
when the racks are
put in that tube,
all of the air is
sucked out of it,
and all of the vapors
sucked out of as well.
In both of these contribute
to corrosive effects
on the servers.
Further, once
the tube is dropped,
it is in a fail in place mode,
which means that as the
servers and components fail,
we simply let the system degrade.
Once it's become degraded enough,
what we do is bring another
tube, another replacement,
drop it in the water,
and then decommission,
migrate workloads and data
off of the one that's
being degraded onto
the new capacity.
So, very efficient way to
manage the servers inside.
The idea being that,
the servers will last longer,
so our average server lifetime in
a data center is
roughly four to five years.
So, we think this can extend
the life cycle of those servers
because in addition to
the lack of corrosive effects
on the servers,
there's no maintenance going on,
and maintenance
disturbs the servers,
which can cause problems as well.
The peewee that we
get out of this,
so if you take a look at
the typical advanced state of
the art data centers on land
that are designed with
traditional technologies,
the peewees are up in
the 1.1 plus range.
The peewee that we expect
on this is about 1.03.
So, this offers, like I said,
a bunch of potential advantages
including the energy efficiency
that we get out of this.
Let me take a look at
Azure networking now.
So, if you take a look
at Azure networking,
it's actually a very broad set
of services and technologies.
You can see some of them listed
here by different categories.
For example, that over in
the DC hardware space,
we've got specialized
network hardware,
including open-source software
that we're putting in
our switches that we released to
the Open Compute Project.
You can see we've got a bunch of
different services that are
available to customers.
The core one, is
virtual networks.
Virtual networks are
what the sought,
the abstractions that customers
bring and layer
IP addresses on top of to
connect their VMs to
their virtual machines
together and
connect them with
our Azure services.
Then, you can see we also as part
of that network topology,
a lot connection from
outside the data center.
One of the examples
of a service that we
make available to IT customers
is something called ExpressRoute,
which gives them
private connections into
our dark fiber backbone
that gives them
high SLAs up to 100 gigabits and
dedicated bandwidth
into our backbone,
and they don't have
to, then rely on
that ISPs and the flakiness
of the open Internet.
Speaking of that dark
fiber backbone,
we've got one of the largest
networks in the world.
Got about over 30,000 miles
of fiber spanning the world,
you can see a map, this isn't
necessarily up to date,
this is a few months old.
But, you can get an idea of
the kind of global
coverage we've got,
including subsea cables
that we've laid ourselves.
One of the examples
is one from Bilbao to
Virginia called Maria that we
launched a couple of years ago.
One of the subsea cables is
a massive capital expense.
We went into it with
Telsys and Facebook,
but all three of us
recognizing that
the existing subsea cables
connecting Europe and the US,
almost all go from
England to New York,
which causes a potential
single point of failure.
So, in addition to building out
an additional 150 terabits of
capacity between
Europe and the US,
this gives us some redundancy
with respect to
those failure points of
the existing cables that
you can see up there.
We've also launched
another subsea cable
connecting Asia and
the US as well recently.
In addition to the dark fiber,
we also have 4,500
peering connect
locations to connect with
other networks around the world,
including ISPs, and then we
also have 130 Edge sites.
The Edge sites are where we
deploy our own
software and servers,
where we can cache
software and we can run
compute on the edge,
do SSL termination and
give customers that
are using our Cloud services
a better experience.
We expect those point of
presence sites to also be part
of our third party Edge strategy
going forward, so
we're looking at that.
Now, one of the aspects
of networking,
I just focused on
the physical aspects
in the previous slide is that,
networking in the
Cloud is what's called
software-defined
networking and what's
off from many of you
probably are aware of this,
software-defined networking
became something
well-known about five
to seven years ago,
we had companies being bought
for a billion dollars,
because they were developing
software-defined technologies.
The idea here is
if you take a look
at traditional IT networking,
it's been an appliance model,
where multiple layers of
this networking stack are
combined into
the single appliance.
Those layers are
the management layer,
which takes commands
for it to configure and
deploy networks in
the data center.
That control layer which
actually takes
those abstractions that are
exposed as APIs and actually
translates them into things like,
ACLs and routing rules,
and then the data plane,
which is programmed with
those ACLs and routing rules
to honor what the control
plane decides,
and the data flows through
the data plane, of course.
With software-defined
networking, we
take those and spread them out,
and we implement them
all in software,
at least mostly in software,
I'll get to a caveat
on that in a second.
If you take a look at the way
we've done this in Azure,
the layers are the Azure
Resource Manager,
which is the control and
universal control plane
for all of the Azure APIs.
If you talk to compute,
if you talk to storage,
if you talk to Cosmos DB,
on the control plane to create
resources and to configure them,
you're going through
Azure Resource Manager.
So, when you create
a virtual network,
you go through
Azure Resource Manager.
Azure Resource Manager hands that
off to the network
resource provider,
which implements those APIs,
and then that calls into
network services
that are controllers
to go and set up and
define those virtual network
ACLs and routing rules,
and then it talks to agents on
the servers or the host
to implement those rules.
The idea here is we want to
push as much logic down.
The advantages of
software-defined networking,
I kind of obvious here,
we're not paying for expensive
proprietary appliances,
which have reliability issues,
they've got scalability issues.
What we've got here is
Cloud native layers of
the networking stack
from the top to
the bottom that we can expand,
that are highly available,
we can tolerate massive failures,
including failures of
availability zones,
and are much cheaper to operate.
Now, if you take a look,
I just mentioned Azure,
the software-defined
networking stack in
the context of Azure networking,
you can see the layers here of
the whole Azure Architecture
at a very high level.
Up at the top there
on the far-right,
you can see the Azure portal,
that command line interface,
these are and the SDKs which
talk these Management APIs,
the control plane APIs to
Azure Resource Manager.
Then, you can see
some examples services
plugged into
Azure Resource Manager.
We call those resource providers
like I just mentioned.
Then, at beneath that,
you've got the Azure
Fabric Controller,
which is responsible for
launching virtual machines,
everything except
for a few services
at the very bottom of
our architectural stack,
runs in virtual machines,
including things like
Azure database and
Cosmos database running
virtual machines.
There's a hardware manager that's
the manager's life cycle of
servers and health
of the servers,
and then the hardware
infrastructure underneath.
Then, you can see that
there's authentication,
role-based access
control, telemetry
systems that span
the entire software stack.
Now, if we focus just on
the networking part of this,
this is a zoom in on what
I was explaining earlier,
we've got Azure Resource Manager,
the Network Resource Provider,
and the Compute
Resource Provider.
There's no network
without compute and so,
these work closely together.
They worked closely together
because a customer says,
''I wanted to find
a virtual network and then I
want to place a virtual machine
into that virtual network.''
The job then of the system is to
go find a computer to put
that virtual machine on and
then join that to
the virtual network.
So, what you'll see is that,
these components all
meet down on the server,
each talking to their own agents.
There's a network agent
down on the server,
there's a compute agent
on the server
that launches a virtual machine.
The network agent then
plums in IP addresses and
routing rules into that
virtual machine to have
it join this virtual network.
So, another area where we've
collaborated with
Microsoft Research,
and this one goes
back to a project
that started in MSR about 2010.
Look one of the MSR researchers,
Doug Burger, who now is part of
the Azure team in
the hardware group.
He had this vision that
FPGAs had a role in
the modern data
center architecture
and FPGAs have come and gone
throughout time and they
fell out of vote
for over a decade.
Doug believed that they had
a place and he took his vision to
Bing and started a project
with Bing to see if they could
accelerate the index serving,
the indexing of being
content on FPGAs.
That project pilot was
successful and then deployed
into production in Bing.
Now, we faced an issue
in Azure about five
or six years ago,
where we saw networks going from
1 gigabit to 10 gigabit
on the servers.
We saw 40 gigabit coming,
50 gigabits coming next year,
100 gigabits around the corner.
When we took a look at us
processing that amount of data
on the servers themselves,
on the compute servers,
on the CPUs, we realized
that we'd be burning
large amounts of those computers
just doing nothing,
but packet processing.
So the question is, how
could we do this
more efficiently?
How could we keep up not
just with the bandwidth,
but keep latency low for
that amount of data processing?
Our network engineers found
out about the Bing project,
Doug found out about
our problem and we started to
collaborate together on something
called accelerated networking.
The idea here is that
we take those FPGAs and
just a quick refresh on
FPGAs for those of you that
might not have been around for
those previous phases or
just know it from
the acronym there.
There's small devices with
tens to thousands
of logic elements,
typically DSPs on them,
interfaces to the PCIE bus,
USB controller.
The idea here, is that
the interconnect between them is
dynamically programmable
within a very short
period of time.
Within a few milliseconds,
you can program the
interconnects between these and
what that allows
you to do is create
a reconfigurable topology.
These kinds of devices because
there're so much redundancy here
of the same kinds of computational
elements that can be
programmed are great for
highly parallel workloads,
and packet processing is
a highly parallel workload.
So, looking at these,
we said, "Hey,
maybe we can plumb
the SDN stack into an FPGA."
Effectively, getting close to
hardware ASIC performance,
but with the flexibility of
being able to update
the algorithms in
the software-defined
networking as we go.
In fact, we have added
many, many features,
dozens of features just in
the last couple of years to
our software-defined
networking stack.
So, we absolutely need
that flexibility.
We cannot burn the software-defined
networking algorithms
into ASICs because
that would just not
allow us to let the network
at large evolve.
So, what we did is,
we figured out how to plumb
the networking stack
into the FPGA,
so that in the previous slide,
where you talk about the
networking agent on the servers,
getting those control
plane rules for
ACLs and for routes
and for IP addresses,
it plumbs those into
flow tables that are implemented
in soft logic on those FPGAs.
You can see the comparison as we
moved into accelerated networks,
we started deploying FPGAs and
every single server
about three years ago,
so now, we think we have
the most deployed FPGAs in
production of any company
in the world.
The left side of
that slide shows you
the software architecture,
where we're processing those
packets on the host on
the servers and you
can see there's
a lot of software layers
that we've got to go to.
On the right side, you can see,
we bypass all that,
allow the virtual machine
to talk directly to
a network interface
that's virtualized and
provided directly to
that virtual machine,
so that they can just
talk directly to
the hardware and so there's
no computational
resources other than
some control plane
operations on the CPUs.
That frees up the CPUs
for of course to do,
more useful things
like actually run
virtual machines instead
of process packets.
So, I'm going to show you
a quick demo of comparison
between those two different
technologies here in action.
First, I've got
two virtual machines here
that I've got without accelerated
networking turned on.
Accelerated networking
is available in
all virtual machines
around the world in Azure,
it's an opt-in currently.
Because we wanted to make
sure that there were
no compatibility issues with
operating systems
that customers use.
So right now, it's opt-in,
soon it will be opt-out.
But on this virtual machine,
I have it opted out.
What I've done is just set up
a receiver with a program
called "Sockperf."
Here's another
virtual machine also
opted out that will talk to
that other one and send it
packets as quickly as it can.
What we're going to see after
about 15 seconds of run,
is the latency that
these two virtual
machines experience
talking to one another over
that software-defined
networking stack as
they're implemented in the host.
So you can see the latency,
it's about 200 microseconds,
which isn't bad.
But let's go take a look at
what accelerated
networking can give us.
So, I've got here,
another program and
this is using something
called DPDK on a Linux VM,
which allows the Linux
kernel to process packets
extremely fast with
very low latency,
so we're going to
see something that
looks a little bit better.
So, that's 10 microseconds
and so we see typically latencies
between 5 and 10 microseconds
as it compared to
about 200 microseconds on
software-defined stack.
Again, with almost no CPU usage
on that versus
the software stack.
Okay, let's switch
back. All right.
So let's talk a little bit
about servers now.
I'm going to take you
on a tour of history
here to show you the evolution
of servers in the Cloud.
Because it's kind of interesting
back when I started in Azure,
the whole premise of Cloud
was cheap commodity
scale out server.
Scale up was anathema to Cloud,
but watch what happens.
Here's our Generation
Two servers that we
launched Azure within 2010.
There are two socket Opterons
with six CPUs on each socket,
so a total of 12 cores,
has 32 gigabytes of RAM and
has 1 gigabit network adapter.
I'm going to quickly
move through these.
Gen Three, we jumped
to 10 gigabits.
Here's our HPC scheme we
started to introduce.
What we did here
with our HPC SKUs,
which we still have
is Infiniband,
40 gigabit, Infiniband
networking,
which offers latencies
that are even down
to 1 to 2 microseconds as
opposed to what we see there.
By the way, that was
TCP over Infiniband.
It's of course a different type
of network protocol,
it's not TCP compatible.
Here, we have our Gen
Four servers,
you can see that
the amount of RAM
that we're starting
to deploy is larger.
There's more CPUs,
and this lets us pack
more virtual machines
onto the same server.
Then, we introduced
something called Godzilla,
and here's the first example
of a massive scale up,
this was about three years
ago and what we saw,
customers starting to migrate
large in-memory databases
to the Cloud.
Specifically SAP HANA, one
of the core workloads in
IT is a scale up in
memory database,
and they needed very large
virtual machines to be able to
support the kinds of
computation that
their enterprises ran on.
So, we introduced this,
that's called the Godzilla.
It has 512 megabytes of memory,
so the largest virtual machine,
and these have
450 gigabytes of memory.
Here's our Gen 5.1.
So, this is what we've been
deploying for last few years.
Then you can see,
we have started to
introduce GPUs and these are
NVIDIA GPUs P100s, P40s.
Then, we introduced this
about a year and a half ago,
this we called Beast.
This one, again,
customers migrating
even more and larger
SAP workloads
to the Cloud including
Microsoft IT,
we needed even larger
virtual machines.
This one has 4 terabytes
of RAM in it,
has 144 processors in it.
So, this now is what we've got
currently in production
as our largest VM size.
You can see our Gen Six server,
this is what we're deploying
right now into production,
and you can see it's got
768 gigabytes of RAM in it.
So, we continue to rise up on RAM
and CPUs and what we just
announced about
a month ago, is this.
We called it Beast B2 internally,
but I think we should
call it Mega Beast.
This thing has
12 terabytes of RAM in it.
So, we're the first
public Cloud to announce
support for 12 terabyte
virtual machines.
That of course, SAP
workload unblocking as
they move to the Cloud.
So not quite commodity
scale out anymore.
Well, all of this
is built on top of
an open source platform
called Project Olympus.
Project Olympus is
designed to be extremely
flexible to support
high density storage,
to support SSD, hard disks,
also to support
all the different kinds of
processor technologies we
want to put into the cloud,
including AMD, ARM, as
well as Intel processors.
We also support NVDIMM.
So, we've support batteries
in these servers that power RAM,
so that we can tolerate
a power outage but treat
that RAM as non-volatile,
being able to write
it out to a flash
before the server
completely loses power.
This is designed to go
to 50 gigabit NICs.
Like I said, we're
already looking at
100 gigabit coming at us.
Project Olympus actually is
open as we contributed
the open compute projects.
So, anybody can go look at
these designs and build
their own servers,
the idea is being that we
innovate with the ecosystem.
In fact, Olympus Gen Six
was the first version
of our server design,
that we actually took
the initial design out,
made it available publicly and
sought feedback from
all our hardware partners,
and so they all made contributions
back into it that were
then eventually made it into
the final Gen Six Project
Olympus platform.
Now, we're also looking at
introducing specialized hardware
into our data centers.
We've done things like we've
announced in Cerberus,
a specialized
security co-processor
that we put into our servers.
But another place
that we're looking
at is accelerating deep learning.
This is another project
led by Doug Burger
focusing on the use of FPGAs
for deep neural networks.
His idea here is again,
machine learning technology
is changing very quickly.
So, instead of burning
topologies into A-six and
having them be static and rigid
and have short lifetimes,
if we can leverage FPGAs,
we could adapt to
new algorithms as they
come up new topologies.
So, the first version of this
is called Project
Brainwave and we
released this publicly in
the public preview at
Build earlier this year.
The idea here, of course,
is that we can plumb
these DNNs into FPGAs.
These FPGAs are on
servers that are
connected with a
low-latency network and so,
we can actually take
very large DNNs that don't
fit into a single FPGA
and deploy them
across multiple servers
to have them
operate as basically
a hardware is a service,
micro-service platform,
that's Doug's vision.
Now, what we've done,
I'm going to talk
a little bit about one of
the specific use cases
here and that is,
something called AI for Earth,
collaboration between us, ESRI,
and the Chesapeake Conservatory
to go look at land
and evaluate how it's changing
with respect to climate
change and other factors.
Traditionally, this has
been done by humans
and satellite imageries
is captured and then
humans go and look
at these images
and decide how much forestry,
how much water,
how much urban space is
in the image and then they
track that over time.
So, hundreds of hours,
thousands of hours spent by
humans up to now on this.
Now, with Brainwave, we
can take those images,
the first algorithm
neural network
that we programmed into
Brainwave is resonant 50.
It's customizable resonant 50
where the customer can train
their own model as
the final layer and then
package that up even and
deploy it on on-premise
Brainwave but it's deployed
as a service in Azure.
Once the customer trains
their model on top
of resonant 50,
they deploy it into Brainwave
VMs and then can access it
through via an API and that's
exactly what this has done.
This pipeline shows you.
I'm going to switch
to a demo to show
you Brainwave in action.
So, first I'm going to show you
this Explorer land
cover classification
which will show us.
If I just select one of these,
it's going to execute it on
Brainwave and inference.
Now, Brainwave can infer,
do inference of an image in
less than two
milliseconds and it can
drive throughput up
to 40 teraflops,
tera operations
off an FPGA device
even at batch size of one.
So, this is what we
call real time AI.
Something that is
differentiated from
other traditional batch
oriented devices like a GPU,
where you've got to feed in 100,
200 images at once
to get that kind of
throughput out of the device.
Now, here's the AI for Earth app
and what I'm going to do
is process these
images using CPU.
You can see the CPU's processing
about six a second,
batch oriented GPU about 60.
You can see it's
progress down here
as it classifies those images
and then here's Brainwave.
Then, like I mentioned,
we can scale up Brainwave.
So, here's Brainwave
with roughly 800 FPGAs.
So, the total dataset for
the US is about 195
million images,
20 terabytes of data.
With Brainwave, we can
process it in 10 minutes.
The entire dataset for a cost of
about $43. Okay,
let's switch back.
So, let's talk a little bit
about compute now and
here's that high-level Azure
architecture diagram that
I showed you earlier.
What we're going
to focus on now is
the compute stack
and specifically,
how we deploy multiple workloads
down onto the platform.
The Azure fabric controller,
like I said, is responsible
for launching virtual machines,
but that's typically
not the way that
customers using
cloud native technologies
will interact with the platform.
They will, of course,
use virtual machines to migrate
their existing IT
workloads in a lift and
shift motion but if
they're building
something cloud native,
they're going to be
using containers
and container orchestrators.
So, that's what you can see here.
A couple of container
orchestrators is kind of the ones
that we have bet on.
One is called AKS Azure
Container Service
which is Kubernetes as a service.
Kubernetes being the Open
Source Container orchestrator.
It's open source.
It was developed at
Google by three people.
In fact, in one of them,
Brendan Burns, actually works
in the Azure compute team.
So, Azure, we leverage
open source a lot.
We contribute to open
source a lot and we've
gained a lot of
credibility with
the open source community
enough that often times,
we see people wanting to
leave Google to come to
Microsoft to work on
open-source technologies.
Kind of something
that is shocking
for people that have
watched Microsoft
over the last two decades.
Now, the other service you
can see here is called
Service Fabric.
This is one technology
we developed internally
and that most of our cloud
services are built on top of.
Most of the Azure infrastructure
is built on top of
this micro-service platform which
can also do container
orchestration.
But it's unique in that it
supports state-full services.
So, it's tightly integrated.
It's federation model,
it's help model to be
highly scalable and to
support the highly
durable storage of data.
When you take a look
at a service like
Cosmos DB which is
built on top of this,
Cosmos DB is replicating data,
through copies of each database.
So, with tight coordination with
the Service Fabric controller
on health, on upgrades,
Service Fabric can
make sure that that
data is always available for
right operations in the face
of failures and
re-configurations.
Service Fabric is something
that if you take a look,
actually, let me back up here,
and I'll come back to
service fabric in a second.
A deeper dive into,
I thought I'd show you just the
complexity that we've gotten to,
in orchestration of something
that seems as simple as
deploying a virtual
machine in the cloud.
Here, you can see all of
the resource providers,
or at least, most of
the resource providers,
and some of the services regional
services that are involved
with the deployment
of virtual machine,
which includes compute
network, and storage.
You can see we've got
the idea of clusters down,
clusters with cluster managers
that manage groups of servers.
So, tenant manager that's TM,
data center manager manages
the hardware for cluster,
network service manager,
software load balancer,
all of those are basically
cluster level services.
They manage a slice of
an entire data center or region,
and then down on the servers,
we've got agents corresponding
to each of these.
When we deploy a virtual machine,
here's the flows that you see,
the service is talking
to each other,
communicating data, so that
we can finally end up with
a virtual machine
down on a server
that has a network and
storage attached to it.
But going back to Service Fabric,
which is layered on top of this,
where the world is going is
something called
serverless technology.
So, we went from containers
a few years ago which
was all the rage,
then to contain orchestrators
and microservice platforms,
and pass on top of microservices.
Then now, what we see
is the next level
of abstraction of
productivity which is
built on top of
serverless technologies.
Serverless, really pioneered
by functions as a service.
We've got our own functions as
a service called Azure Functions.
The idea here being the developer
just writes a piece of code,
then give it to
the Cloud platform,
and we take care of
everything else.
We take care of Auto scaling,
we take care of
the reliability of it.
They focus just on the code,
and they pay for
only the resources
consumed by that piece of code.
So they specify, ''I want
this much RAM for this code,''
and we bill only
for the amount of CPU
consumed by the code.
So, if the code never runs,
the functions registered
nothing ever happens,
they don't get charged anything.
If 1,000 invocations of
that function happen,
and they each run for a minute,
then they get billed
1,000 compute minutes.
So, it's pay-per-use,
and really great for
bursty type of workloads.
It's also event-driven
model that is
now becoming very
popular in the cloud.
Each of these functions is
triggered by some event.
For example, some image
being dropped into
a storage account,
could kick off a function
that then calls into
brain wave to go to
inference on that image.
But, what we've got
with Service Fabric is,
we're taking that
serverless concept and
expanding it to general-purpose
microservices to
general platform,
not just event-driven
micro functions.
In this case, we just
announced the public preview
of this service,
Service Fabric Mesh, and
Service Fabric Mesh is
serverless service fabrics.
You can actually
take a description
of a microservice application,
it's even stateful built with
even abstract programming
models like the actor model,
and give it to us,
and we will go and launch it on,
we'll take care of the
provisioning of virtual machines,
we'll launch those
containers onto them,
and then you get paid only for
the execution of
those containers,
and never have to worry
about the infrastructure,
only focus on the containers.
So, this is where
the world is going.
Is this kind of developers
only worried about their code?
The platform takes care
of everything else.
We're at the next stage of
this and there's still a lot
of work to do in this.
Now, to highlight out one of
the very productive
collaborations
we've had with
Microsoft research,
we've been collaborating
with Microsoft research
on the problem of
resource allocation in
our data centers since
cluster one joined Azure,
about eight years ago,
including Microsoft Azure,
one of the key researchers
there, in fact,
we hired him into the Azure
about a year and a half ago.
He was instrumental in this.
The collaboration with
Microsoft research
continues on this project
called Resource Central,
which is generally looking at,
how do we make use of resources
in general more efficient?
That efficiency comes in
a number of different forms,
how do we pack
virtual machines more
tightly and not waste
resources on servers?
How do we make sure that
an evictable workload doesn't
interfere with the placement
of a non-evictable
higher priority workload?
How do we make sure
that we don't exceed
power budgets in our data
centers as CPU spins up?
All of these are
questions that we
want to get better insights
into out of resource central.
If you take a look at
the resource centrals,
heavily dependent on
machine-learning algorithms,
taking in historical data,
from production,
training on that data,
and then generating models
that we then deploy
as offline services that are
integrated with
the online resource allocator.
They're offline in the sense that
the running system does
not depend on the availability
of those services.
These are optimizations.
One of the principles
in the cloud is,
you want as few dependencies
on the runtime operation of
the service as possible,
because failures always happen.
If there's non-critical
component,
you don't want to put
that in the hot path.
But you can look at the problems
that we're looking at,
virtual machine scheduling, which
clusters virtual machines go to?
I mentioned power
over subscription,
how we do server maintenance in
efficient way without
impacting customers,
or minimally impacting customers,
and then even giving
customers right sizing
recommendations based on
looking at historical patterns
of other virtual machines
that they might be
running other ones
that are running
similar workloads,
their historical usage resources
by that particular
virtual machine.
We can say, ''Hey,
looks like you could easily fit
into a virtual machine
that's much smaller or,
you need more resources actually,
because you're [inaudible].''
Here's just one example
of one of the projects
that we've had ongoing.
I say on-going because
these are never kind
of ending projects.
There's always room
for more optimization.
As Donna mentioned,
one percent of optimization
in the right place
can have huge impact.
When you're talking about
millions of virtual machines,
and you're talking about
millions of servers,
saving one percent of
your resource can literally
mean millions of dollars.
So, this resource
allocation algorithm
that we've developed with MSR,
assigns each resource dimension,
assigns it a weight which
is relative to the scarcity
that resource,
and then runs this optimization
problem where we try
to maximize the availability of
those scarce resources
through our allocations.
You can see there on the right,
two example allocations.
Obviously, the one
on the right is a
better one for
this particular server.
If we allocate two low
memory machines meaning,
they consume a lot of course,
but little memory,
we waste a bunch of
memory on that server.
But if we take one of
those low memory machines,
take another one that's not
consuming as many CPU core,
so we've got additional
CPU cores there left for
virtual machine that is
a high memory virtual machine few
cores relative to
the amount of RAM,
we can actually make use of all
the resources on the server.
So, that's just a simple look
at the optimization problem.
Here's another example
where we take a look at
historical eviction patterns
for virtual machines,
and use them to influence
the allocation algorithm.
So, we've got
two virtual machines a,
and we know b is
going to becoming.
B is a full server
virtual machine,
a is a one-fourth
virtual machine.
There's two servers
that we've got to pick
from to allocate this.
Both can fit this
virtual machine a.
The question is,
which of these is
the better fit if
the probability that
a virtual machine is
going to terminate by
time t when we need to allocate
virtual machine b is 1.5?
You can see the
probability if we deploy
that virtual machine
a on node one, well,
the probability of it terminating
those two VMs that
are done on node one
terminating by time
T is 1.5 squared,
the probability of one of
those three virtual machines,
all three terminating
on node two,
to make room for
that virtual machine b
is 1.5 to the third,
and you can see that so
the probability that
time T that will be
able to allocate virtual
machine b is six over 16.
But if we place the
virtual machine on node two,
all odds get better that virtual
machine b will find a home.
So, that's another example of
the resource
allocation strategies
that we've developed in
conjunction with MSR.
You can see we've used a bunch
of different machine
learning algorithms
depending on the problem for
CPU utilization modeling,
random forest, for a lifetime
the virtual machine
gradient boosting trees.
The accuracy of these models
as we develop them
is extremely high.
High enough of
course, that they are
having significant impact on
our efficiency in
the data center.
So, let me now switch
to Azure Storage.
Azure Storage,
there's three layers
in the architecture roughly.
So, you can see the
front ends there,
you can see a partition layer
which partitions
the data and then,
you can see a distributed file
system layer which handles
what are called streams
for those partitions.
That layer is the one that
actually does the replication.
The others are effectively
the control plane
and the DFS layers,
the data plane for
the storage service.
On top of this, we've
got load balancers,
and then the storage service
exposes a number
of different APIs.
It exposes a simple Blob API
in both block blob and
page blob forms,
page blobs something that
we use for virtual machine.
Hard disks so
page blobs are fixed.
The page is fixed in size,
a block blob is arbitrary size.
Queues which are
simple 5.0 queues.
File API, which it includes SMB,
as well as HTFS we recently
announced as well.
These storage services
support replication
across regions.
So, if you deploy something with
regional geo-replication
or a GRS,
then what you've got are
the partition layers
doing replication.
Now, like many of the pieces of
Azure that we started
with 10 years ago,
we've had to
re-architect the system
as we've gotten to
a larger and larger scale.
The initial storage architecture
was based on the storage
stamps or clusters,
a fixed number of
physical servers,
and what you've got when
you've got a large datacenter,
a region with many many stamps is
inefficiencies and scale limits.
The scale limits are
the limits of the servers
in that one stamp,
or the maximum limits that
we can offer customers for
throughput and size of
their storage service.
Then, it's inefficient in
terms of the fragmentation,
because we've got to have failure
buffers in each cluster,
we've got an expansion buffers in
this cluster and we can't
amortize those buffers across
all of that servers in
the datacenter region.
So, we started re-architecting
the storage system about
three years ago actually,
and we are putting this
into production now.
This is called Gen2
Azure Data Lake Storage.
We call it internally
limitless storage,
because if you take a look
at some of the scale limits,
so a storage account
limit and we're going to see
these limits even go up,
50 gigabytes per
second as opposed to
the 60 megabytes per second
that we're used to have.
The throughput spikes are
instantaneous and there's
no warm-up for these.
The scale, our target is
that one storage blob could
consume every single byte of
storage in an entire region.
That's the kind of
scale limits we're going to,
so hence the name limitless.
You can see that
the throughput now that
we can drive off of being able to
take advantages a much scale-out
storage servers as possible
in this architecture.
We can go from uploading
a five terabyte blob
in 14 minutes that
previously took,
you can see 12 hours.
So, massive improvement in
the throughput that we get
with this new architecture.
But we're also looking with
MSR at the future of storage.
We've got hard disks,
we've got SSDs,
we also have tape now,
and we've got also
something called Pelican,
a project where we've got
hard disks in basically a
hard disk appliance form,
a rack of hard disks
where only some of
the hard disk can be
spin up at a time as
a low-cost archival
storage solution with
higher latency than
a single hard disk solution.
But we want to do better.
It's all about
maximizing efficiency,
especially for archival
storage which can be
around for long periods of
time without ever being accessed,
you want that to be
as cheap as possible.
The world is now storing
massive amounts of data.
So, we've got two projects
going on with MSR.
One, it's called Silica
and it's looking at glass
and I'll explain that one.
Another one's called Plex and
it's looking at use
of DNA to store data.
I don't have time to talk
about Plex this morning,
but that's another
a very exciting project
that offers the promise of
storing a full zettabyte of data
in a single rack in a datacenter.
Just to put that
into perspective,
the estimate is
that there's around
20 zettabytes of data
on the planet today.
So, 20 racks and you could store
all the data in
the world on Plex.
But going to glass, let me
talk a little bit about,
this is a project with
the University of Southampton
collaboration with MSR Cambridge
there and us in Azure.
The idea here is to store
data in glass, obviously.
Now, that might seem
straightforward,
just etch the data
into the glass.
The problem is up to now,
there's been no way
to efficiently
store data at high
density in glass.
Because whenever somebody
tries to etch it with lasers,
they end up with defects.
The defects caused by
material deformations because
the laser pulses are
damaging the atomic
structure of the glass.
So, you might write something
but you can't read it back.
What the University of Southampton
and Microsoft research
had been working on are
using high-speed lasers that
operate at the femtosecond range,
which is the same pulse
duration that you might
have cataract surgery
with or lens surgery.
How long is a femtosecond?
It's really short, I
had to look it up.
With a femtosecond laser,
we can pulse and
generate what are called
voxels in the glass,
which are three-dimensional
structures.
You can see here that
it have some depth.
By targeting laser at
different depths in the glass,
we can create multiple layers of
this in one piece of glass.
I was given a sample yesterday by
the project Silica team
that has 76 layers in it.
So, we've gotten now where
we can store very
efficiently vertically,
as well as horizontally
with five-micron separation
between these voxels.
Here's a look at
the reader for this.
The interesting thing about
glass is once you write it,
you never have to refresh it,
it lasts for literally millions
of years longer,
or billions of years, really.
So, outlasting
even probably the earth.
You don't need
a specialized reader,
you don't need
a specialized reading
technology to read it;
all you need is light.
So, flashing light on this
and looking at the reflections of
the light or how that light
is diffracting off those voxels,
you can read the data off of it.
So, this is our early look
at a read head.
You can see that this one
is going through 20 layers,
showing a simple image
of those 20 layers,
so each of those depths.
Then, here's a look at
how we decode the data.
We take multiple images at
a layer using
different conditions,
and then we look
at the retardance
and orientation and then
map that to orientation and
retardance where we can encode
data in those two values.
You can see here an example
of a three bit per
voxel example of
encoding data on that glass.
What this promises in
terms of density too,
it's story about 50 terabytes of
data into centimeter square.
So, this is orders
of magnitude more
dense than the kinds
of archival data that
we have had now.
This would be a huge breakthrough
for archival storage.
It's one of those
long-term projects we got.
MSR it's multiple years,
significant amount of
investment, of course,
and material technology and
equipment that needs to be taken
from a lab prototype and then
made datacenter production ready.
And we're all doing
it in collaboration.
So, that brings me to
the conclusion of the talk.
Here, I give you a brief overview
of a high-level
Azure Architecture,
look at some of
the interesting MSR projects.
The bottom line is
that when it comes to research
in the Cloud besides,
of course, innovative
programming models,
innovative typologies,
communications architectures;
much of the research
that we focus on is
driving up efficiency and
scale and performance.
You can look at the
projects that I showed you,
they're all aimed at efficiency
scale and performance.
They're just some of the projects
that we've got going with MSR.
I mentioned Plex, we've
got another one optical
networking called Sirius,
we've got another one that
you're going to hear
about here if you go to
Albert Greenberg's talk
on Crystal Net which is
simulating or wide area
networks so that we can make
configuration changes safely.
The list goes on and on.
Like I said, it's a competitive
advantage of us to have
MSR here to go out and
explore these kinds of areas.
In some cases, ones that
seem far-fetched to us
when we initially look
at them and offer
a bunch of research problems
like Project Natick,
that if they are realized,
it could really
transform our business.
So, with that, I
want to wrap it up.
Again, I want to
thank Eric and Donald
and Dan for inviting me to open
up this faculty summit with you.
I hope you have got
a fantastic few days here,
a bunch of interesting talks
you've got ahead of
you and thank you very much.
