DUSTIN KIRKLAND: Hello.
My name's Dustin Kirkland.
I'm a product manager
at Google, and I'm
joined here by Ricardo Rocha.
Ricardo gave a keynote
from his experience
as a computer
engineer at CERN IT,
working on the Higgs
boson experiment.
Ricardo, can you give us maybe
a short recap of your keynote?
RICARDO ROCHA: So
the initial goal
that we had when
we first came here
was to try to reproduce the
analysis that actually ended up
being a Nobel Prize.
They gave a Nobel Prize
to some physicists,
which was the discovery
of the Higgs boson.
And we wanted to redo
the analysis using
new technologies, Kubernetes,
and other cloud native
technologies.
The initial experiment--
the initial analysis
was done in 2012,
or just before,
and announced just a year after.
And we had to use a lot
of computing resources,
and we just wanted to
show how much easier it
is using more
recent technologies
after a couple of years.
So the initial analysis
takes hours or even a day,
and we wanted to do
it in three minutes.
So that was the goal.
It was quite challenging,
but it actually works.
So it was pretty awesome.
DUSTIN KIRKLAND: Three
minutes to reproduce
the results of the original
Higgs boson experiment?
RICARDO ROCHA: Yeah.
So initially, because
we had a 20 minute slot,
we started targeting
it to 15 minutes.
And then we realized
that actually,
with the help of the Google
Cloud, we could go faster.
So we ended up doing it in,
like, three, four minutes.
DUSTIN KIRKLAND: Wow.
Can you give us
just a sense of some
of the differences
between running
that sort of compute on-prem,
on your own infrastructure,
versus running in a
hosted cloud with Google?
RICARDO ROCHA: So it
just changes our focus.
We have a large data center,
but large in the sense of, like,
science, of course.
If you look at
Facebook, or Google,
or all these big
public cloud providers,
it's a different scale.
So in our own environment,
we know it very well.
We know how to tune
for it, but we also
have to spend a lot of time
maintaining these resources.
When we go to the public
cloud, the Google Cloud,
we focus more on
learning the details.
But we don't have to focus so
much on the infrastructure.
We can focus more
on the application,
which is a benefit for us.
So we can be in a
higher level of focus.
DUSTIN KIRKLAND: So
you can spend your time
and your investment a bit more
on the experiments and the data
itself, rather than
managing the infrastructure?
RICARDO ROCHA: Right.
That's exactly it.
There's still some advantages
to have our own data
center for some
of the workloads,
because they are very
close to the experiments.
So the experiments
are basically on site,
so we can pull links,
and have very fast links,
and process the data
very fast, low latency.
But for a lot of our workloads,
going to the public cloud
allows us to burst, also, for
extra capacity when we need it
and focus more on
the application side.
DUSTIN KIRKLAND: Can
you give us just a sense
of the amount of resources
you need for this experiment--
CPUs, memory, disk?
You told us time,
but the other bits?
RICARDO ROCHA: Right.
So the experiments are
generating something
like 1 petabyte a second,
which is way too much.
So we have these hardware
triggers, filters,
and software filters, and
we reduced to something
like 10 gigabytes a second.
That's what we're storing.
So in total, we are generating
70 petabytes a year of new data
that we have to analyze.
In house, we have
around 300,000 cores,
but then we use a distributed
computing environment
where we double that.
So we need something
like 700,000 cores
to process all this data.
So that's roughly--
in house, we have
a total of 400 petabytes
of storage, right now.
DUSTIN KIRKLAND: Wow.
Wow.
Give us a sense of what's
coming next, maybe something--
another experiment
or how you plan
to use your on-prem
versus the Google Cloud
infrastructure in the future.
RICARDO ROCHA: So that's
very exciting for us,
because the accelerator is
now doing an upgrade phase
and will have one more run.
But then what's coming
next is much more exciting.
We call it the high
luminosity LHC.
And the reason it's
high luminosity
is actually because we generate
a lot more collisions--
so a lot more data.
So we're increasing
the amount of data we
have to process by 100 times.
And right now, we
don't really know
how we'll manage all of
this, both on the storage
side and computing.
So we're investing a lot in
machine learning and learning
how to use better GPUs
and potential TPUs.
And of course, because we have
a limited amount of resources
in house, we're looking a lot
to expand to the public clouds
to have more resources,
but also to be
able to use new technologies
like TPUs and things like this.
It's very exciting.
DUSTIN KIRKLAND: Last
question, can you
give us a sense of which of
the Google Cloud products
you were making use of here?
RICARDO ROCHA: So we based
the whole deployment on GKE
for managed Kubernetes.
But we actually also
used the Google Cloud
Storage to host the data,
and we were reading from it.
And then we had to
publish the results.
At CERN, we were using
Redis, so at Google Cloud,
we were using Google
Cloud Memorystore,
which is the equivalent.
So we kind of span a lot.
We could even do more, probably,
if we invested more money.
DUSTIN KIRKLAND: Ricardo,
this is fascinating.
Thank you so much.
RICARDO ROCHA: Thank you.
[MUSIC PLAYING]
