[MUSIC]
>> Welcome to The Control Room.
I'm Scott Bounds, a Media Cloud
Architect with Microsoft
based in New York City.
Today, I'm coming to you from
the Catskill Mountains
in Upstate New York.
Today's webinar, we'll be discussing
rendering in visual
effects using Azure.
I will be joined by a couple
of colleagues of mine from
Microsoft who are based in
Pittsburgh and Los Angeles,
a real example of remote
production in today's environment.
We have a jam-packed agenda,
so let's get started.
[MUSIC]
>> I'd like to kick
off this webinar with
a discussion about what
we are hearing from
our customers about rendering in
visual effects workloads in Azure.
I'd like to welcome to this episode,
Greg Denton and Anthony Howe,
Program Managers with
Azure Engineering.
Greg, had recently joined Microsoft
after years in the industry,
and Anthony is a 10-year
veteran of Microsoft.
Hey, guys, what are you hearing
from our customers about
their key concerns when
looking at rendering in
visual effects workloads?
>> Thanks, Scott. Yeah,
so being new to Microsoft
data and definitely
interested here, Anthony's
perspective here.
Some of the issues that I hear
repeatedly from the industry,
probably the biggest one
is really just about
ease-of-use and automation
and being able to
deploy quickly to the Cloud and in
a repeatable and reliable fashion
using tools like Terraform,
things like that.
So Anthony, any thoughts?
>> Yeah. Thank you,
Greg, and thank you, Scott.
So that is what we've been
hearing from the customers.
Definitely the ease-of-use,
the automation.
Security is a big piece
in the sense that
the M&E industry requires
tight security and tight
security standards,
and as part of that automation,
the Terraform can help
describe the infrastructure.
So it makes it easy to
get through the audits.
So yes, exactly.
We've been hearing those major areas.
>> Awesome. About auto-scaling.
How does Terraform
really tie into that?
What tools are available
from Azure to be able
to just quickly spin up
a bunch of instances in
a repeatable fashion?
>> Yes, so for Azure,
we have simple solutions all the
way up to complex solutions.
So for some of our customers
that need to get up to,
let's say 40,000 cores,
they actually will just use
a VM scale set and they'll
run that VM scale set.
All those machines will register into
the render manager and then they'll
actually just scale
that down themselves.
Some of our other customers
need more complexity,
so they will use a solution
like Azure CycleCloud,
and Azure CycleCloud provides
the necessary mechanisms
for auto-scale up and
auto-scale down based on integration
with your render manager.
So whether it's OpenCue
or whether it's
tractor or any of the other
popular render managers.
>> Does tractors
support that directly,
like is there plugins
available for it right now?
>> Yeah. If you actually go visit
the CycleCloud on their Azure
CycleCloud GitHub sites,
you will actually see
the tractor plugins
available right away there.
So essentially to get
started along that track,
if you wanted to integrate it,
you would install
the CycleCloud node,
which is a controller node.
Then in that node,
you would install your plugin,
and you can download that
right off the GitHub site,
and it's the tractor plugin
and that'll hook in and watch
the tractor render manager
database for new jobs
and as new jobs come in,
and depending on how large they are,
it will scale up to
what's required to run.
>> That's awesome. Very cool.
One other point that ties
into that auto-scaling
is around licensing.
I know one problem that
comes up repeatedly
is a studio has a set of
licenses available on-prem,
and that may be a fixed number
of licenses and they need to
then either rent licenses or pay
burst licensing fees in
order to move to the Cloud.
It's a problem that I've seen at
numerous studios and I
think it's just something
that is a good opportunity for us at
Azure to really come
up with good solution around.
Do you have any thoughts?
>> Yeah.
Exactly. I think in Azure, we're
progressing to a good solution.
But right now, we have
the two spectrum.
First is the customer can buy
additional licenses
to burst to Azure,
and this requires them to really
understand the load that
they're going to run on Azure,
how many machines are
going to run on Azure,
and then they buy that amount
of licenses for that month.
The alternative is, on
the other spectrum,
in our Azure Batch rendering service,
we do have licensing in
there for Maya and 3ds Max,
and essentially, that's really
good if you're going
to do short renders.
But if you go beyond
three days in a month,
it's more cost efficient to buy
your monthly licenses and tie
them into your license server.
Some of our customers
actually just stand up
a separate license
server for the Cloud,
and then they maintain that separate
license server month by month
and it gives them a really clear
picture on what they're using.
But where we want to
evolve is we want
to evolve to something in the middle.
Another trick actually
customers have been doing is,
and it's within the
license agreement,
is the licenses are per-machine
and they can actually
use some of our really dense SKUs,
so we have some of the
96-core AMD machines.
They can actually use that and make
very efficient use of those licenses.
>> Cool. Yeah. Speaking
of licensing and cost,
another thing that
comes up all the time
is really around cost
control and metrics
and being able to see what your spend
is and in as close to
real-time as possible,
being able to analyze and ultimately
really predict what your
budget is going to be.
There's so much work that
goes into just trying
to forecast what your workloads
can be ahead in time.
I know producers would love to
be able to see more information,
more dashboards around
what their spend
is and how it's affecting
their show budget.
What does Azure have
to offer in that area?
>> Actually, that's a
really exciting area that's
in the last year has really seen
a lot of growth and innovation,
and that's our Azure Cost Management.
We had acquired Cloud in a couple
of years back, Microsoft had,
and that got embedded
in the portal and then
innovations on top of that
over the last two years.
Now, we have an API
that we expose and we
have a whole Azure Cost
Management infrastructure.
So you can take all that data.
Now, as customers, you tag
your renders and you tag
your burst rendering purge.
You could tag it per scene,
you can tag it per project.
Then that all goes into the
cost reporting database,
and then you either
query those APIs into
your own cost reporting
infrastructure or you can
actually use some of the
new Power BI dashboards.
The Azure Cost
Management Team actually
has a variety of templates now
that right out of
the box give you very powerful
views into your spend,
and you can tell right away and catch
unexpected spend that is really
going to put up your bill.
>> Awesome. Yeah, the Power
BI dashboard is there.
I can imagine a super
powerful and really helpful.
So that's great to hear.
Another area that has prevented
a lot of studios from making
effective use of the Cloud I think it
is really around storage
management also.
So no one wants to sit
around and make sure that
they're tracking file copies
from point A to point B,
moving across different
regions in the Cloud and just
uploading them to the
Cloud and pulling
their files back down manually,
it's a huge burden and has
often been pretty error-prone.
I know with Avere that
workload is a lot simpler.
Can you give me any details
around what advances and what
progress we've made there?
>> Yeah, so for the storage,
an acquisition back in January
2nd of 2018 was
the acquisition of Avere
as a cache into the Cloud,
and after the acquisition,
again, there's been innovations
from the Avere team.
There's a vFXT product
that's on Azure and
a managed product that takes
that vFXT technology and
turns into a managed
service called HPC Cache.
Essentially, the vFXT and the
HPC Cache run the Avere OS,
and what the Avere OS enables
is that cache in the Cloud,
so brings the data close to your
running nodes in the Cloud.
So there's only ever a
single source of truth.
So the nodes reach to
the Avere for the data,
the Avere if it doesn't have it,
goes back to on-prem.
So what this enables is fast
time to access that data.
Plus, you never have the
synchronization issues.
So if you're running
periodic synchronizations,
you don't have race
conditions on the data.
You don't have stalled
synchronizations
because something happens with
the synchronization tool.
The Avere handles
that all for you and
actually makes it super easy
to implement the burst
rendering into the Cloud.
We'll give a demo a little
bit more later on some of
the new technologies with the
Avere that have been built on top.
>> Awesome. Yeah, that'll
be super helpful.
Just not having to manage
all that data yourself
and really have to worry about
it is just a huge relief.
>> One other additional thing that
customers really like about that is,
when you do synchronize data,
you then end up with two copies,
not only the single
point to truth problem,
but having the second copy
becomes a security issue.
Now, they have to worry
about this data that
can potentially become
an exfiltration problem.
Having the Avere there,
the Avere only brings up
the data that it needs,
and once all the rendering
is done, you delete,
the Avere goes away and
all that data goes away.
Now, you don't have that
extra security point.
This is another
advantage of the Avere,
is you don't have to worry about
a second data sync to secure.
>> Got it. I imagine as
productions are getting
more and more spread worldwide
and moving data between studios,
being able to render
in separate regions,
to try to maximize
your pipeline efficiency and
cost and things like that,
being able to just spin up in
Avere in a separate region
and get to work rendering is
a great benefit there too.
>> Yeah, exactly.
>> That's awesome here. Well,
thank you, Anthony and Scott.
>> Yeah, thank you.
[MUSIC]
>> Now, let's dig into technical
details of render on Azure.
One of the first questions that
were asked is why Microsoft?
Looking at the regional footprint of
Azure and our hyperscale
architecture,
you can see why.
Microsoft has 61 announced regions.
These are across the US, Europe,
and Asia, and also include one of
our newest announced
ones in New Zealand.
This regional footprint gives
the connectivity as well as impute
wherever customers need it.
Also, Microsoft runs its own network.
All of the regions are actually
connected with fiber
optics run by Microsoft,
so we can really provide you
low-latency and also fast transfers
between regions as you look
at your different workloads.
Now, let's welcome
Anthony back and start
to dig into some of
these other questions.
Anthony, what are the problems
that customers are really
trying to address?
>> Thank you, Scott.
Today, I'm going to
highlight one of the main problems
that customers are facing.
Customers are facing a problem
of wanting to easily describe,
manage, and deploy their
on-demand infrastructure.
They want to be able to use
the same tool as on-prem
as in the Cloud.
In our architecture diagram here,
you can see the major
components on-prem,
which is the on-prem render farm,
the network attached storage,
and also the artists.
Then in the Cloud,
the major components are
going to be the ExpressRoute,
the gateway, the storage cache,
and also the render farm.
Customers come to us and say,
"We don't want to manually
stand each of these pieces up,
we want one solution
to manage them all."
The solution that we've
heard most about from
the studios in the last
year has been Terraform.
The customers love Terraform.
They love the fact
that Terraform could
describe all of the
infrastructure in Azure,
plus it can also help them describe
their infrastructure
on-prem and be able
to have a single solution
to manage across on-prem,
Azure, and any other platform.
Another thing on top of Terraform
is its infrastructures code.
This means that the Terraform
can all be checked into
your source repository,
and now being part of
that source repository,
can now be part of
your DevOps pipeline.
A very popular technique
that's being used,
that we're seeing more and more in
the M&A industry is
this notion of GitHubs,
which takes the DevOps concept one
further and uses Git to
manage your Terraform.
Git will take the plan and
somebody will be able to
approve the Terraform
plan and Git will
then go deploy it on
behalf of the user.
They really like that integration
with the DevOps pipeline,
it enables that agility plus
also enables auditing and
a history tracking for
security audits for when the
security audits come by.
Another thing that
the Terraform enables
is lowest total cost of operation.
With automation, you can
scale up really quickly,
scale down really quickly,
and only pay for what you use,
which speaks to the on-demand
running of services.
M&E rendering studios run
on race of thin margins,
so you really want to always be
turning off your infrastructure
when not in use,
saving that infrastructure costs.
Terraform has provided all of this.
Based on this, we saw the signal
last April start to appear
from the studios, April 2019.
As we started hearing more
and more, we said, okay,
we're going to start building
out the Avere Terraform,
or we're going to build out
the examples, providers,
and modules to help past
rendering and we're starting
at the Avere piece.
I'll show you the Terraform site.
This is the Terraform site on GitHub.
It's github.com/azure/avere,
and then you click on
the Terraform piece,
and it brings you into the
examples, modules, and providers.
This provides different ways of
setting up your infrastructure.
It assumes different
storage back-ends.
You can see here now as examples like
generic NAS filers,
Azure NetApp files.
I'll go through four
examples on this website.
Once we implemented this,
we saw immediate adoption
from the studios,
and very positive customer feedback.
Some of our customers who had never,
their organization wanted
them to use Terraform,
and they had never used it before,
they came started using the examples.
Within a day, they were running
a full on-demand suite
that they were able
to spin up and then tear down.
It's really led to great
customer engagement.
>> Anthony, one of the question
I hear from customers is how do
I migrate my on-prem
investments to Azure?
>> One of my favorite
examples is the,
that's on our Terraform site,
is the backup config and
restore of the Avere piece.
A lot of our customers
running renders on-prem are
running the Avere on-prem.
The easiest way to get them
started in the Cloud is to run
the config restore script that
takes a snapshot of the Avere,
and then using our
Terraform provider,
we automatically produce
the Terraform template
for HPC Cache and
the Terraform template for
the Avere vFXT on Azure.
This is really useful
because they snapshot that,
then captures all
their custom settings,
all their core filers,
and it captures everything
that they've mounted.
Then when they go to
deploy it on Azure,
as long as they have their
ExpressRoute and networking setup,
everything mounts back just
like it's running on-prem.
This gets them started along
that path then they
realize how easy it is to
start render nodes and they can
import those into the Cloud.
There's mechanisms for importing
the custom image into the Cloud,
and that gets them
started along that road.
This is one of our
examples for our site.
>> Another question that we get is,
how can we improve remote
artist workstation performance?
>> Yeah. This was
a very interesting need that
came up in February and March.
That was, the customer had
came to us and they said,
"How do we improve
our workstations performance
to our on-prem NAS filers?"
Inside the Avere, the
Avere usually comes with
this read cache by default that
accelerates your rendering.
But there's also a
mode in there called
read-write 30 with
attribute checking,
also known as collaborating
Cloud workstation.
This mode is great for
artists that are on the
other side of the country,
or in other regions,
and need to overcome that latency
back to the on-prem filers.
One of our customers did,
was they set up the
solution in Terraform,
made it part of their
DevOps pipeline,
so when the artist comes
in in the morning,
the infrastructure stands
up and they're able to use,
which includes their workstations
and includes the Avere,
and at the end of the day
when the artist goes home,
the whole thing turns off.
It was really nice integration point,
plus provided that nice performance
for the artist during the day.
>> Anthony, a concern
we've been hearing from
our customers is about the costs
associated with virtual machines,
is there a way for
the cache to warm up?
Do you have any solutions
to address this?
>> Yeah. One of our
customers came to us and
said they want to save the
most amount of VM time.
They had frustration over
their VMs just sitting
idle waiting for the caches to warm.
We said, "We can help you with that
by providing a cache
warmer solution."
On our Terraform site,
we have this cache warmer solution,
it's integrated into
the Terraform pipeline.
What it will do, is as the
infrastructure is brought up,
the Avere is brought up,
the cache warmer will,
you can submit a job
to the cache warmer
and submit the directories
that you want to warm.
The cache warmer will proceed to
start a VM scale set that will be
massively parallel and pull all
the files through up to the cache.
That VM scale set will
run as a low priority.
It'll run with the OS ephemeral disks
and be the lowest cost
to warm that Cache.
As soon as that's complete,
the VM scale set gets torn down and
the Terraform job
submission completes,
allowing then the render nodes to
start and then use that warm data,
so they're not sitting there
idle waiting for that data.
Now, one of the questions
you may ask is, "Well,
why can't the render
nodes just have acted
as the VM skill set
that you brought up?"
What happens is,
the render software usually
opens the files serially,
or if there is a bit of parallelism,
it doesn't open up
enough parallelism.
So that's why the Cache Warmer does
that maximum parallelism
to pull the files through,
and then when the
render nodes do start,
they get rendering right away and
the time to their first
pixel output is super quick.
>> Now that my data is
in Azure in a region,
how can I actually use that
across the global flop
that we talked about earlier
to render in multiple regions?
>> The Avere allows you
to extend your data.
It allows your data to
reach into other regions.
So for example, it extends the
reach of Azure NetApp Files.
So for some customers who
have rendering capacity
in another region,
or there's rendering
capacity for them,
there are SKUs in another
region they want to
use with their source data,
like some of the AMD SKUs,
they can keep that source data in
the existing Azure
region where it is,
but then deploy the Avere
into the destination
region and reach across.
This solution here uses VPN Gateway.
VPN Gateway is required
for Azure NetApp Files.
You can also use a
VNet peering if you're
using a solution different
from Azure NetApp Files.
But again, to reinforce
the advantage of Avere,
it preserves that
single-point to truth,
so no need to replicate data.
The other cool example is,
the Avere hides latency.
This is a real render demo.
We're going to render
10 seconds of a scene.
The compute power is in West US 2,
and the source data is in the UK.
So we're going to render with 500
nodes this 10 seconds of frames,
and you can see the latency
is 130 milliseconds.
This shows the difference
between using the Avere,
which is the HPC Cache
that you see here,
and you see the difference
without the HPC Cache.
So you can see quite
a time difference,
more than twice as fast,
and also the peak gigabyte
per second is way higher.
That's just because the
data is close to the nodes.
This leads to our third point,
that it makes efficient
use of the bandwidth,
so that it's only pulling
through the data once.
Again, the Avere can be deployed
in any region around Azure,
any of those 61 regions
that Scott showed earlier.
>> I'd like to welcome Greg back.
Greg, can you give us some
insight on OpenCue and
other activities at the Academy
Software Foundation around rendering?
>> Totally, thanks. So OpenCue is
a render management system
that is entirely open source.
It was originally developed
at Sony Imageworks,
and about a year ago,
it was made open source and
brought it into the Academy
Software Foundation.
As part of that,
we've been continuing to build it out
as an SWF product and a project,
and we're continuing to add
in more Cloud functionality.
It's really been gaining a lot of
momentum and interest
in the industry,
and on the Azure side,
now we're really interested in
tying it more into
some of our products,
being able to do things
like we've talked about,
being able to autoscale and quickly
deploy the whole environment
because it ends up being a fairly
complicated setup sometimes,
you have a central database server,
you have multiple agents running
both on-prem and in the Cloud.
You can have really
complex hybrid deployments
that need to be managed,
as well as just being
able to make sure to
deploy all of your other
third-party apps with it and run it
in a way that doesn't really
feel like you're running
the render farm on the Cloud
and a render farm local,
but just a single
central point to manage all
of your rendering needs.
So it's a really great project
and I believe Anthony is
starting to look into you
some future work to help
with easy deployment around
Terraform deployments to Azure.
>> Yeah. Thank you,
Greg. Just to add to
our Terraform site that we showed
in our previous demonstration,
we built out partway
of the infrastructure
required for a burst render.
So we are going to complete that.
So you should keep checking
back to the Terraform site,
and we're going to have
a full end-to-end deployment with
Terraform that includes the OpenCue,
render manager, the UI piece to it,
and then also tie in with being
able to scale up a VM scale set.
Then eventually
further down the road,
we'll have the VM scale
set piece there initially,
because that's the
simplest one to get going.
We're going to tie in some of
the Azure Cost Management
that we talked about,
and then we're going to go
one level of complexity.
This is the NorthStar
and that's getting
the Azure CycleCloud in there,
so we can have true autoscale
up and autoscale down.
So stay tuned on the Terraform site.
>> Very cool. That'd be
great. Thank you, Anthony.
>> Anthony, could you give us
a quick overview of all the
resources that are available for
customers interested in
starting rendering on
Azure for a POC or a pilot?
>> Yes. So this page here shows
a lot of the great resources
we have available for you.
To start off, we have
our first render pilot,
and then that'll show you how to
get started quickly on Azure.
The next is our Terraform
site that we've talked about.
We've showed four examples today,
but it has 20 plus modules and
examples on that site
and always continuing.
So keep refreshing it.
You're going to see more
and more content on there.
You're going to see our
rendering white paper.
It goes over those key
areas that customers have
talked about why they need
to move to the Cloud.
So based on cost management,
based on security,
based on ease of use,
the rendering white
paper touches upon that.
Then also, we have
a great article by Rick Shahid on
the GitOps for Azure rendering.
Going into that, using
your existing DevOps pipeline
and making the rendering
infrastructure,
building out the rendering
infrastructure as part of that,
and we also have two case studies,
one for Mr. X and one from
Outpost of their work on Azure.
Finally, we have resources
around HPC Cache.
You can go learn
about how you can use
the managed service HPC Cache
to deploy into the regions,
and then also, the CycleCloud,
how you use that to scale
your render nodes to hundreds
of thousands of course.
>> Anthony, after watching this
webinar on reviewing the resources,
what should be the next step for
a customer interested moving forward?
>> Great question. I would say that
the next step really is to dive
into our first render pilot.
The first render pilot
has two phases in it,
and what we say to customers is,
for your first two phases of jumping
into rendering, keep it simple.
So the phase 1 is get a single
frame rendered on Azure.
It actually sounds simple,
but there's actually
a lot involved there,
that's getting your network set up,
getting your software
on the custom image,
and getting that custom
image running in the Cloud,
and getting that registering
into your render manager.
So that's phase 1.
Once you've ironed out those details,
you're now ready to scale.
Again, we say, keep it simple.
So what we say is use a VM scale set,
and then from the VM scale set,
also use the Avere cache
and then render a scene.
Then that's the end of your phase 2,
and our first render pilot document.
We'll go into that in more detail and
the technologies involved
to get you started.
>> Well, that's a wrap
for this webinar.
I hope you found all
the information in
demos and scripts very useful.
Please provide us your
feedback and comments,
and also please check out
all the other webinars
as part of The Control
Room. Thank you.
[MUSIC]
