TODD PRIVES: Hey, everyone.
Thanks for coming to this
afternoon session, where
we'll be talking about
visual effects in animation
rendering on GCP.
My name's Todd Prives.
I'm a product manager
here at Google,
and I focus on solutions for
the media and entertainment
vertical.
So during this
particular session,
we've got some
Oscar-caliber customers who
are going to share with you
their experience of using GCP
to render imagery that's
high quality, cost effective,
helps them deliver
projects on time,
and really most importantly
as far as visual effects
animation goes, it serves the
creative purpose of the story
So I'll start by giving
you a brief overview,
talk a little bit
more about what
rendering is at a high level.
We'll look at some of
the things that GCP
has done to enable
studios to achieve really
great high-scale rendering.
And then we'll hear
from Monique Bradshaw.
Monique's VP of
business development
and operations at ConductorIO.
They're a cloud rendering
platform running on GCP,
and their customer,
Atomic Fiction,
just put through several
million hours of rendering
on GCP on the film "The Walk."
And this is a film from
Oscar winner Robert Zemeckis,
and the film itself
was shortlisted
for the Oscar Bake-off for
best visual effects this year.
After Monique, we'll hear
from Brennan Chapman.
He's the head of pipeline
at Moonbot Studios,
they themselves an
Oscar-winning studio
in Shreveport, Louisiana.
And Brennan is going
to go ahead and share
his experience of using GCP
to set up a hybrid environment
for rendering, where
he's been able to use
his on-premise
resources and then burst
to GCP for excess capacity
and really demanding jobs.
So at a high level,
what is rendering?
Well, there are several
different variations,
but really for the purpose
of media, entertainment,
and animation, it
comes down to this--
taking a model inside of
a 3-D application that's
been built by an
artist, and taking
all of the data associated
with that model--
so that could be textures.
That could be geometry.
That could be assets.
That could be animation.
That could be lighting.
Imagine to accurately
light this,
you could need to set up
thousands of points of light.
So it's taking all of this
data and number crunching
this to give you a final fully
realized photorealistic output
such as this.
So this is a single frame
here, and to run this render
on a 32-core machine
takes about an hour,
just to give you a little bit
of context of what this one
particular rendering job is.
But keep in mind the
scale of most projects.
What I'm showing you
here is one frame.
So to get one
second of animation
for a film or
television project,
you're talking about 24 frames.
So right there, you're
looking at 24 hours
on a 32-core machine to render
one second of animation.
And then you want to go do
10 seconds of animation,
a minute of animation,
60 minutes, 90 minutes.
So by the time you're at a
full animated feature film,
the numbers are
really astounding.
And in this particular
case, it's an hour render.
But in the case of
"The Good Dinosaur,"
this is Pixar's last
movie, I actually
just saw these
stats two days ago.
This was a 48 hour a
frame-- sorry, 48-hour
render a frame project.
So it was something like
175,000 frames, each one taking
nearly two days of render time
on a high-core-count machine.
So imagine when you're
delivering a film of this scale
and of this scope.
You're looking at renders
in the hundreds of millions
of core hours.
So it's very daunting.
It's very demanding.
And these numbers are
increasing year over year.
Audience expectations are
getting higher and greater,
and the demands and the
vision of the director
is also increasing in scope.
It's also important to note
that rendering isn't just
a really important
use case when you're
talking about big animated
feature films or movies
with superheroes or
spaceships or dinosaurs.
It's everywhere.
So rendering is
really ubiquitous
in what you're seeing
on television today,
at the cinema, or
whatever you're
watching on your mobile device.
And as Monique Bradshaw
is going to talk about
in greater detail, "The Walk"
is a perfect example of this.
So this is a
character-driven piece.
It tells the story of a
Frenchman, Philippe Petit,
who put a high wire between
the two twin towers of New York
and walked across it.
So there's no spaceships here.
There are no monsters.
But in order to create
this environment,
this photo-real
world to help sell
the vision of the story, the
visual effects studio Atomic
Fiction that worked on
this had to digitally
re-create massive parts of
New York and render this.
And for a story
where there's nothing
obvious as a visual
effect, you're
still looking at over
9 million core hours
needed to make this movie.
So it's really important to
note that any time you're
watching something on TV, at the
cinema, on your mobile device,
it's very likely that there
was a significant amount
of rendering required
to produce that output,
whether or not you realize it.
And so what are the challenges
associated with rendering?
Obviously, as
we've demonstrated,
it's incredibly
computationally intensive.
And it's also very bursty.
Both Monique and Brennan
will attribute to the fact
that you're working on a project
sometimes for months at a time.
Maybe your on-premise capacity
is at maximum utilization,
and there are always last-minute
changes that come in.
The director has new
things they want to see.
The script's been revised.
So you can never
have enough capacity,
especially in that month or
two before final delivery
of a project.
There's always a need for
more-- always a need for more.
So bursty workloads
are really something
you see whether you're
working on a small commercial
and you've got five
or six projects coming
in at the last minute,
or a large feature film.
Another advantage,
or another need
to have centralized compute,
is in multi-site locations.
We've really seen in the last
five to seven years studios
opening up worldwide locations.
So it's not just
one studio working
on a project for 10 and 12
hours a day and going home,
but instead they'll
work on that shot,
let's say at the
headquarters in London.
That shot will get
passed off to the team
in Montreal working on it.
They'll work another
eight to 10 hours.
Then it'll get passed off to the
satellite office in Singapore.
So we're seeing more and more
of a 24-hour dynamic workload,
even among smaller studios.
So the need for having rendering
scalable compute resources
is really critical.
And how does GCP fit
into all of this?
Well, our machine
capacity obviously
is a critical part
to all of this.
GCP is the network
that runs Google.
So what we serve ads and search,
and Youtube, and all of this
was built at Google scale.
So for us to have
a studio come to us
and need to
provision 20, 30, 40,
50,000 cores in a matter
of minutes is no problem.
And you contrast
that with the studio
having to go out and
find the physical space
and bring in all the machines
and set up their networking,
oftentimes that
they only need it
for a week or two
of burst capacity.
It's really not feasible.
So the scalability
we offer is really
unparalleled in the industry.
I think our pricing
model in particular
is fantastic for bursty
rendering workloads.
You only pay by the minute
after a 10-minute minimum.
So this allows you to
really scale your workload
in parallel.
Going back to the example I
gave before of the police car,
so let's say that's
one hour to render
and I have to render
one second of animation,
so that's 24 frames.
So what I would do on GCP is
spin up 24 VMs concurrently
and assign one frame
to each of those.
So there is no
longer the concept
of having to wait
for a job to queue
behind a limited available
number of machines.
But the job is done
as fast as possible,
and that's returned
back to your artists
so they can continue to iterate
and revise work in a manner
faster than ever.
And when it comes
down to running
an animation or a visual effects
studio, on a day-to-day basis,
your highest costs are really
that of your human capital.
You're paying these artists
several hundred dollars a day
to work on these shots.
And if you think
about it, if they're
sitting there waiting
for these renders
to be returned while they're
in a limited on-premise
environment, they're not
working when they should be.
They're not working
at full capacity.
So not only are you
taking into account
the infrastructure spend,
but you're also wasting
a lot of this human capital.
And then specific to the
visual effects in animation
and media
entertainment industry,
we've done a lot
of things at GCP
that really
differentiate us from any
of the competition out there.
Studio-specific security is
a huge issue in our industry.
Often cases you're talking
about billion-dollar tentpole
blockbuster franchises.
So if any of that data
was to leak ahead of time,
the Hollywood
production studios could
be looking at hundreds of
millions of dollars of losses.
So there's an incredible
amount of concern
over putting your
data in the cloud when
it comes to this sensitive,
prerelease theatrical content.
So what we've done at
GCP is actually work
directly with the Hollywood
production studios
educating them around our
security and our compliance.
And in several
cases, we've actually
undergone studio-specific
audits of GCP
that have awarded us a
tier I certification,
allowing our customers to put
the most sensitive prerelease
theatrical content on our
cloud for rendering workloads.
We've also built
scalable solutions
for all types of studios.
So if you're a
small studio and you
want to do rendering
on GCP, we have
a product called Zync Render.
And this is a turnkey
solution, all inclusive,
no coding needed.
And it's literally a plug-in
you install inside of your host
application, and you can access
tens of thousands of GCP cores
to render your project
instantaneously.
And for larger studios
with more custom pipelines,
we have a dedicated
solutions architecture guide
that's been written
for us in conjunction
with a technical director at one
of the leading studios in North
America, a studio that's been
using GCP for well over a year
and a half to do
their rendering.
And this is a full breakdown,
a step-by-step guide
of how to integrate
your existing
on-premise infrastructure
and tie that
to GCP for rendering workloads.
And really the
backbone of all of this
is our ISV relationships.
So our industry, no matter
if you're a five-person shop
or you're a 1,500-person shop,
the core stock is the same.
Everyone is working with the
same standard commercial tools.
They might have their
own additional software
that they've written
on top of that.
But it's tools like
NUKE from The Foundry,
V-Ray from ChaosGroup,
and Renderman from Pixar,
are the foundation
of all rendering
workloads in media
and entertainment.
So we worked very closely with
the top ISVs in the industry.
And if you look
at this list here,
there's not a single
Oscar-winning film
over the last 15 years
that has not used some
or all of this software here.
So we're working very
closely with these ISVs
to enable a scalable
deployment on GCP
that's concurrent to
the scalable model we
offer with our billing.
And the results so
far have been awesome.
We've seen hundreds of
millions of core hours
now put through GCP
across feature film,
commercial advertising,
episodic television, VR content,
live-action cinematic rides,
and animated projects,
just to a name a few.
So we're excited to continue to
see this uptake of our product.
And I'll go ahead and turn
over now, Monique Bradshaw.
Again, Monique is VP
of business development
and operations at
ConductorIO, and she's
going to share the
story of Atomic Fiction.
Monique.
[APPLAUSE]
MONIQUE BRADSHAW: Thanks, Todd.
As Todd said, I'm Monique
Bradshaw from ConductorIO,
and ConductorIO is a company
that provides cloud computing
as a service.
We focus on the image-rendering
needs of the media
and entertainment space.
And ConductorIO provided the
cloud rendering capabilities
for Atomic Fiction for
the work that they did
on the recent movie "The Walk."
As Todd said, the
movie "The Walk"
tells the story of-- the true
story of wire walker Philippe
Petit, who strung a wire,
illegally, between the two
towers of the World Trade
Center and walked across it
without any safety gear at all.
And today I'm here
to share with you
the story of how cloud
computing played a critical role
in Atomic Fiction's
ability to re-create
the twin towers and the
New York cityscape of 1974
for the movie.
But before I tell you
about the movie "The Walk,"
I'm going to tell
you a little bit more
about how compute resources
are used to create
visual effects for film.
Now, Todd touched on
this a little bit,
but I wanted to go in a
little bit more detail,
because it can get
pretty complicated.
So in the beginning
of a shot's life,
an artist works with a
lightweight wire frame
representation of the scene,
basically like the top image
there.
Now, the relative
simplicity of this scene
makes it quick to interact with
locally, so a team of artists
can work within interactive
graphics programs
to create recipes
for how cameras move,
how surfaces are textured, and
how lighting falls on objects.
Now, once that recipe is
defined, it needs to be baked.
This process of calculating
how light is received by,
reflects off of millions of
surfaces is called rendering.
And the results of
this process can
be seen in the bottom image.
Now, in a film, a
single second of footage
is typically made up
of 24 images or frames.
It's not unusual for a single
iteration of a single frame
to take between 5 and 10
hours, sometimes even more,
to render on a single
multicore machine.
So given that, it's easy to
multiply that out and see
how a single iteration of
a single second of film
can actually require thousands
of hours of compute time.
So to understand how
important cloud resources were
to the creation of the
effects in "The Walk,"
let me talk a little bit
about the hurdles that
had to be overcome for this.
As I mentioned, ConductorIO
worked with Atomic Fiction,
and they were the
visual effects company
that was charged with
putting the audiences 1,400
feet above New York for
the climax of the film.
The first challenge
they had to overcome
was creating the environment.
The towers themselves had
to be built perfectly,
down to every last nut and bolt.
An artist then spent months
introducing imperfections to
make them look hand-built.
The activity in
the streets below,
from the cars to the people
to the flocks of birds,
the moving atmosphere were
all created from scratch.
The scenery below
was created too,
with spinning fans and
rooftop HVAC systems,
rain gutters on buildings,
hot dog stands in the streets,
and changing stoplights to
make the city feel alive.
Every conceivable camera
angle had to be accommodated,
and that meant a number of
lighting scenarios-- everything
from nighttime to early morning
to a blustery, overcast day,
to a sunny fall afternoon.
All of that flexibility
and complexity
meant that a massive amount of
compute time would be required.
For "The Walk," a
single iteration of one
frame required 83
gigabytes of input data
and took over seven hours to
render on a 16-core instance.
If you multiply that out, and
given the nature of production
deadlines, it meant that the
artists at Atomic Fiction
would need the
ability to access over
15,000 cores of compute
power simultaneously
for sustained periods of time.
The second challenge that
Atomic Fiction had to face
was the tight budget that they
had to create these effects.
Now, for a medium-sized
company to acquire
that amount of
compute infrastructure
locally would require
leasing or renting
a large amount of
computer systems,
as well as sourcing at
colocation facility,
the appropriate technical
support, and all of that
would add up to a financial
obstacle that would
have blown the budget for them.
Additionally, visual
effects companies
don't realize any profit
till the end of a project.
So it was incredibly
important for Atomic Fiction
that they could get up and
running with as little overhead
as possible.
This is an important graph.
It illustrates the
financial challenges
typical of rendering needs in
the visual effects industry.
So as an example of a
medium-sized studio,
the jagged blue line represents
Atomic Fiction's actual work
profile and relevant spend
for compute resources.
The red line represents a
typical resource profile
for a growing visual
effects company
that's investing in
on-premises resources.
In this traditional
environment, resources
are required to try to
cater to the peak need.
The time, effort,
and money required
to implement those
additional hardware
means that the resource profiles
plateau around the last peak
need, and they can't
adjust downward
to match the actual render
workload at given times.
So what you can see here is
that the pink shaded area
is where on-site resources
are overprovisioned
for the actual
workload, and that
represents wasted investment.
The blue shaded area is
where on-site resources were
underprovisioned for
workload requirements,
and that represents a negative
impact to artist productivity.
So as you can see,
the spiky nature
of typical content creation
workload results in companies
cycling between the cost
of limited computing
power negatively impacting the
productivity of their artists
and too much deadweight being
carried during downtimes.
So the solution to these
challenges of complexity,
cost, and challenging
work profiles
was to utilize the flexible
capacity of the cloud.
And that's where
ConductorIO comes in.
So Atomic Fiction utilized
Conductor's cloud rendering
platform to manage its cloud
computing needs for the film.
Now, the benefits
of cloud computing
can really be summed
up in one word-- scale.
If you take a look
at this simple table,
you can get a very
clear contrast
in two styles of managing work.
In the first column,
you see 1,000 tasks
spread across 50 instances that
result in a total turnaround
time of 10 hours.
In the second column,
you see the same number
of tasks spread across
1,000 instances,
resulting in a turnaround
time of only half an hour.
Now, as you can see,
there are good reasons
for each example of
resource allowance,
but one of the biggest
benefits of elastic scaling
is the ability to scale up
rapidly as the need demands,
allowing artists
to iterate faster.
That improves productivity
and creative quality
by limiting the amount
of time the artists
have to wait around to see
the outcome of their work.
And it allows them to be
creative and iterate faster
in the same amount
of calendar time.
Large peak capacity
and rapid downscaling
lets teams postpone
their render needs
until later in the schedule,
and that's creatively
and financially beneficial.
The on-demand scalability
afforded by GCP
allows companies to maintain
a direct relationship
between project
cost and revenue.
No longer does the
studio's revenue
need to cover unused but
costly infrastructure.
In addition, these
days, data and analytics
are critical aspects of
pretty much any company.
The scale of data
collection and analysis
of large amounts
of metadata that's
enabled by leveraging
cloud architecture
also allows for
better cost management
and streamlining of workflows.
So Conductor's written on top
of the Google Cloud platform,
and it provides functionality
critical to meeting
the needs of a demanding
visual effects production.
For us, there were a
number of key factors
that we needed to evaluate
in choosing to use Google.
The big one from a
business perspective
is the ability to do permanent
billing, which Google enables.
When we're providing a service
to an industry where workloads
are often broken down into tasks
that require fractions of hours
of compute time, the ability
to offer true pay per use
is a critical aspect
and a critical part
of our value proposition.
Now, obviously Google's
a global company,
and that global presence is
incredibly important for us.
Single customers that we have
may have studios in locations
as diverse as Europe,
North America, and India,
and being able to provide a
standardized service that they
can interact with from any
location in the same way
is very important.
Also, having access to the vast
engineering talent of Google
to help us work through
architecture optimization
or resolve issues
gives us the confidence
to provide a service to
businesses where up time is
incredibly important.
Now, one of the holy grails
for the visual effects industry
is really being
able to mine data
for bidding accuracy and
thorough render cost analysis.
And accurately predicting
the time and cost
for a given render
job is also critical.
Now, to do that at scale, where
we have thousands or millions
of concurrent tasks
generating data constantly,
we needed a powerful
database option.
And for us, that's Bigtable.
With Bigtable, it's
huge, it's extensible,
and it's incredibly responsive.
And when we have huge numbers
of jobs running at the same time
through Conductor,
we can then use
BigQuery to access that
data and present it quickly
to our customers, providing them
real-time information about how
their work is going.
Finally, Google App
Engine offers us
a ton of configurability
out of the box, which
is really important for
a small company to get up
and running quickly.
And it's an important
capability for our team,
because it allows us to focus
on implementing our key roadmap
features rather than worrying
about our application
architecture.
All right, so to talk a
little bit about conductor
architecture, one of the
most important things for us
was to have as little impact
on the artist's workflow as
possible.
So we created plug-ins that
fit into the most commonly
used software for
artists, and then
that is installed in the
artist's local machine.
Other than that, the
artist's workflow
is pretty much unchanged.
They work as they always have.
They save their data
to the right place
in the local network,
and that's it.
When the artist is
ready to render,
they pull up the installed
conductor plug-in,
enter a few bits of
parameters, things
like choosing the instance type,
setting up the frame range,
and then they just hit Submit.
The plug-in then sends
those job parameters
to the Conductor
application, which
serves as the brain of
the Conductor service.
The application is hosted
in Google App Engine,
and there it initiates a
number of key functions.
So the app uploads
dependent files and data,
including files, scripts,
and environment variables
via an uploader daemon
from the local file system
to the cloud, where we also do
a deduplication process to make
sure that we're
using the storage as
effectively as possible.
The data is initially
stored in GCS,
and then specific instances are
spun up to sync data from GCS
to a Gluster and a [INAUDIBLE].
And we scale up at 100 render
instances per cluster instance
in order to meet our IO needs.
Now, the Conductor app also
determines the required number
of render instances and
spins up the right number
from Google Compute Engine,
scaling up quickly, reliably,
and massively.
We run a Docker container that's
customized for our execution
environment.
Using Docker allows us
to potentially accept
custom images from customers,
and that's a major benefit
for highly customized studios.
The render instances
read the necessary input
files from the Gluster share.
Once all the
components are running
and active, the Google
app-- or excuse me,
the Conductor app manages
the rendering process
from start to finish,
including licensing, queueing,
error handling, and
metrics collection.
The web UI, also hosted
from Google App Engine,
allows users to control
and monitor the render jobs
from wherever they are.
Datastore is used
to store the output,
to store state and operations.
Job metrics are stored in
BigQuery for easy access,
and also made available
via the web dashboards.
Once a job completes, the output
is stored in GCS and Gluster
NFS for future ease of use.
The data is also downloaded
to a local path specified
at job submission,
so the artists
have that in their
local environment again.
If there are no further jobs
requiring that same instance
configuration, then
the Conductor app
shuts down any
unneeded instances.
So what does that
really mean and result
in when it's put into use?
Let's take a quick look
at a quick two-minute clip
of some of the effects that
were for "The Walk" that
were generated using Conductor.
[VIDEO PLAYBACK]
[MUSIC PLAYING]
[END PLAYBACK]
[APPLAUSE]
So that gives you an idea
of the real complexity that
goes into making a
movie, and also it
tells you that the best
effects are the ones that you
don't even know are there.
So here's the final breakdown
for the cloud computing
that went into
creating those effects.
Over 9 million core
hours were used
to create about 30
minutes of film.
The final portion of the film
was all done through Conductor.
And of those 9 million hours,
almost 3 million of them
were in the last month alone.
It gives you amazing evidence
of the power and flexibility
of scalability.
Now, all of that
equated to a peak total
of over 15,000
concurrent cores running
in Google Cloud for
sustained periods of time.
Another major benefit of
the distributed nature
of cloud computing is how
accessible it is to people
all over the world.
For the portions of this movie
worked on by Atomic Fiction,
75 artists collaborated from two
different locations-- the Bay
Area and Montreal,
Canada-- without any change
to workflow or additional
management overhead.
As you can see, in order to
accommodate the peak render
load, we estimate that it
would have cost Atomic Fiction
about $1.5 million to
procure the necessary local
infrastructure.
By using Conductor and the
power of cloud computing,
Atomic Fiction realized a cost
of about 50% just on render
infrastructure alone.
Now, as impressive
as all of those
numbers are, for me,
this quote really
captures the power and the
promise of cloud architecture.
This is what we have
the ability to do
when we are able to
leverage the capabilities
and flexibility of the cloud.
We have the ability to
fundamentally change
the way an industry works.
Thank you very much.
[APPLAUSE]
TODD PRIVES: Thanks for that,
Monique-- some really great
insight into how they
built their platform using
GCP and some of the
amazing photo-real
visual effects work
that they completed.
So now we hear from Brennan
Chapman-- Brennan, again,
the head of pipeline at
Oscar-winning Moonbot Studios.
BRENNAN CHAPMAN: Thanks, Todd.
[APPLAUSE]
Hi, I'm Brennan Chapman, head
of pipeline at Moonbot Studios.
Moonbot is based out of
Shreveport, Louisiana.
We have about 50
people on staff.
We've been honored with
quite a few different awards.
A couple of projects you
might recognize us from
include "The
Fantastic Flying Books
of Mr. Morris Lessmore,"
"The Scarecrow,"
as well as "The Numberlys."
These take multiple forms,
including short films, apps,
games, and even printed books.
We pride ourselves as being
a multi-platform storytelling
studio, and we use
whatever medium
that we see best fit
to tell our stories.
A story I'd like to
share with you today
is called "Taking
Flight," and it's
based on the life and heritage
of Antonio Pasin, the inventor
of the Radio Flyer wagon.
This short was rendered
in part by Google Cloud.
[VIDEO PLAYBACK]
[MUSIC PLAYING]
-You know, that wagon
used to be your dad's.
I guess your dad didn't
tell you about the aliens.
[ZAP]
[END PLAYBACK]
[APPLAUSE]
BRENNAN CHAPMAN: Hope
you guys enjoyed that.
So that story was great, and we
used a lot of cloud rendering
on it.
But the question is, why use
cloud rendering at Moonbot?
What is the benefit we get?
And we've sort of touched
on it a little bit,
but just to state it from an
exec at Moonbot's perspective,
it increases our
creative freedom,
which is our golden
ticket for Moonbot.
It allows our artists
to be more creative.
It limits them
from the restraints
of the fixed capacity
of a local render farm,
to be able to scale and
accommodate any creative goal.
And thus because of that, we're
able to meet our deadlines
and not go past
them, which is huge.
And also, it helps
us to save money.
As they've stated, the rendering
is a lot of burst workflow.
And if we don't have to buy a
local compute farm that scales
to the peak times
and we underutilize
the capacity at other times,
we save a lot of money.
Similar to Monique's graph,
here's a graph of-- last year
we had about three projects
that we were rendering.
They were all sort of
going on at the same time,
including this project we just
showed you, "Taking Flight."
And you can see three main
things with this graph.
One, you can see that there's
these high peak times.
There's times when we have
enormous amount of renders.
And by the way, this is
a graph of the amount
of hours each artists have
submitted per day to the farm.
The second thing
you can see is there
are times where we hardly use
anything, maybe none at all.
And the third thing
that you'll see
is there's this orange line that
we sort of target at Moonbot,
and that's the sort of
medium, the stuff that we
use all the time.
And our goal is to sort of
have local compute capacity
for that median line.
Anything above that, we want
to push it to the cloud,
'cause that's more
cost effective for us.
But anything below that, we'll
render locally at Moonbot.
So our rendering goals for
implementation at Moonbot
was we wanted it to be easy to
use, we wanted local and cloud,
so we wanted a hybrid
approach, and we only
wanted to use cloud when needed.
We didn't want to
treat cloud the way
we do local nodes,
where we basically just
spit out 50 nodes that just
sit there in the cloud.
We wanted to only spin up
nodes when we're using them,
and we try to be smart
about how that works.
So there's two render farm
strategies that we see.
If we do everything
on the cloud,
you get sort of the
render everything
as soon as possible
strategy, and that's
because if you rent
one node for 100 hours,
you're paying the
same cost on the cloud
as renting 100
nodes for one hour.
So if it costs the same,
why not render it faster?
So that's great.
So that's cloud only.
But with a hybrid farm, you sort
of get a different approach.
You have this local
compute capacity,
and you get the best cost
and savings when you fully
utilize the local
compute capacity,
and anything extra
you push to the cloud.
But with that, you
need a time component.
You need sort of a deadline
for each of your renders
to be able to calculate, am
I fully using my local farm?
And once that's done,
you push it to the cloud.
So to do that, we present
our artist with this dialog
when they submit
stuff to the farm.
And the first option just asks
them, when do you need this by?
And they can set whatever
time factor they want.
It defaults to the next
morning, because a lot
of times they submit before
they leave for the day.
But this gives us
our time factor
to be able to know, when is
a render being completed by?
And if we have time, let's
render it on the local farm.
And if not, we'll
render it in the cloud.
So great.
So why did we choose
Google Cloud to do this?
Three main things.
They had great support.
So talking with them
originally, they've
helped us [? fix ?]
a lot of stuff,
and set up our pipeline
to easily support this.
They even suggested the
second option, Avere,
as our storage vendor, and I'll
get into that in one minute.
And also they have permanent
billing, which is huge for us.
We only pay for pretty
much exactly what we use.
We're not paying
for our time slots
that we might not use
the full capacity of.
So implementing the cloud
rendering into our pipeline,
we sort of faced
three main challenges,
and I'd like to share with
you some details with those.
First is storage-- how do we
transfer our files to the cloud
to be able to render with them?
So here's what we
need to transfer.
We need to transfer 3d
scene files and textures,
and then we need to
download the rendered frames
and maybe logs back from them.
We're also in Shreveport.
We don't have the fastest
internet available.
We're sort of limited
to this 100-megabit
per second connection.
So we have to have to be smart
about what we're uploading
and when we're uploading it.
There's two ways that we saw
that we could sort of attack
this problem.
The first way is by writing
sort of custom scripts targeted
into each application.
So how does the application
load in data, and we
have to isolate all that,
maybe parse the scene files
or something like that.
And then whenever
you render it out,
we have to target and
find what it rendered
and transfer those back locally.
But this is done very
individual application specific.
So we also, through
Google's help,
we figured out there's
another solution called Avere,
and this solution acts as like
a cache to your file system.
And because it happens
at the file system level,
It can be sort of a
one-size-fits-all approach.
So let's look more at that.
So here's sort of a graph of how
our infrastructure is set up.
We have an Avere cache node
that sort of sits in the cloud,
and we have our core
filer that sits on site.
And Avere will
automatically upload data
as it's requested to the cloud.
And the cloud keeps this cache
that can be then distributed
to all the render nodes.
The render nodes connect into
Avere and just ask for stuff.
It doesn't have it, it uploads
it from the core filer.
If it already has it, it just
serves it out of the cache.
The way this works is
it mounts via the NFS,
which is the same way our
local storage handles,
which creates this
nice uniformity
between our local cloud
and our Google Cloud.
Makes it nice and easy.
It also autouploads dependent
files, since you're doing this
just at a file system
level, and also supports
partial file transfer.
So if you only need the
top of head of your file,
you don't need to
read the whole thing.
Avere is smart enough to only
transfer that part of the file.
The other challenge is, how
do we set up render nodes?
How do we preferably
have a uniformity
between how we have local nodes
set up and the cloud nodes?
And we came up
with this workflow.
We take sort of a base OS.
We use CentOS in our
case in the cloud.
And we run our configuration
management tool.
We use Ansible at Moonbot to
configure our local nodes.
And then we can just run
that the same thing again
in Google Cloud on
the base OS to get
all the software and
configuration and everything
installed in a VM.
We then bake that to an
imaging in Google Cloud,
plug it into an instance
template and instance group
to be able to spawn
pools of those nodes.
There's two types
of instance types
that we look at at Moonbot.
There's the on-demand
and preemptible nodes.
Our goal at Moonbot is
to try to take advantage
of the preemptible
nodes, because they're
less than half the cost.
But they come with
two side effects.
Their availability
is not guaranteed,
and they also refresh
after every 24 hours.
The benefit, though,
is the render
farms are capable of
handling fault tolerances,
like if a frame fails, it
will automatically be retried.
So thus, by nature, as long
as the preemptible instances
aren't too wildly
unavailable, we
can take advantage of
this preemptible instances
and take advantage
of the cost savings.
And we use instance groups to
manage the pools of instances.
This is especially useful when
you have preemptible instances,
because their availability
is not guaranteed
and sometimes they'll go away.
The instance groups
will automatically
manage adding them
back if they ever
become available again,
so we don't have to write
all the code to handle that.
We're also able to start and
stop these pools of instances,
and we sort of see a start-up
time about three minutes
to get everything running.
So I can go into
the instance group,
say I need 50
nodes or 100 nodes,
and three minutes later
they're on my farm
rendering and doing the work.
Instance performance
was interesting, too.
We found that not all instances
in Google Cloud are the same.
Be careful if you're doing
this, because we found
that Haswell nodes
were about 20%
faster than the
Ivy Bridge nodes,
even though they might
have a lower clock speed,
for our specific
rendering applications.
And with that, we sort of
saw an 80% performance match
with cloud nodes
compared to local nodes.
So it is slower,
but the benefit is
we can have a whole
lot more of them.
So the last challenge
we really faced
is how to forecast renders.
So if we want to do
this hybrid approach,
how do we really get that
data to be able to know
when to submit to the cloud?
So what we need to
compute is some type
of total number of the
pending render hours,
and then from that maybe
we get the delayed render
hours at sort of a
specific point in time.
So the way that we
do this is an artist
submits their job to the
farm, using the dialog
like you saw earlier,
and we split it off
into two main groups.
We have sort of preview
frames and other frames.
So what that means is you
might have a shot that's
about 1,000 frames long.
You would take the
first, maybe middle,
and last, and you would render
those with the highest priority
first, maybe on your
local farm-- just whatever
gets it done fastest.
Then you take the time that
it took to render those frames
and use that to
estimate the time it's
going to take to render
all of the other frames.
We do this for each of the jobs
that are submitted to our farm,
and we end up with a total count
of the pending render hours.
We then use benchmarks
of our local nodes
to be able to see how
much local capacity
we'll have in a
certain amount of time.
And then we can get a benchmark
of how many delayed hours there
are in the cloud.
We can use benchmarks
of our cloud nodes
to then get an exact count
of how many cloud nodes
we'll need to add for
any time and to complete
our renders by their deadlines.
So great.
So that's most of what we do
at Moonbot to get everything
on cloud rendering,
and we end up
with these three main benefits.
We get creative freedom.
We're more flexible.
We can scale on demand.
It removes constraints on
scheduling, which is huge.
I can't tell you
how many times I've
had to rebudget based
on schedules shifting.
But the cloud enables that
to be much more flexible.
It saves time.
We get faster rendering,
simplified budget planning,
and increased flexibility.
We pay as we go.
We don't have to have this
capital expenditure up front.
And it enabled us to succeed
as a company at Moonbot.
So thank you.
[APPLAUSE]
And finish it up with Todd.
TODD PRIVES: Thank you.
Thanks everyone for
coming to our session
and hearing from our
two great presenters
on how GCP has really enabled
them to dynamically scale
their businesses
in a manner that's
cost effective and
really, most importantly,
gives them a lot of
creative freedom.
So thanks again.
And we'll see you all tonight
at GCP after next-- after dark.
[APPLAUSE]
[MUSIC PLAYING]
