>> Live from Barcelona,
Spain, it's theCUBE,
covering KubeCon +
CloudNativeCon Europe 2019.
Brought to you by Red Hat,
the Cloud Native Computing Foundation
and Ecosystem Partners.
>> Welcome back to theCUBE,
here at KubeCon CloudNativeCon
2019 in Barcelona, Spain.
I'm Stu Miniman.
My co-host is Corey Quinn
and we're thrilled to
welcome to the program
two gentlemen from CERN.
Of course, CERN needs no introduction.
We're going to talk some
science, going to talk some tech.
To my right here is Ricardo Rocha,
who is the computer
engineer, and Lukas Heinrich,
who's a physicist.
So Lukas, let's start with you, you know,
if you were a traditional enterprise,
we'd talk about your business,
but talk about your
projects, your applications.
What piece of, you know, fantastic science
is your team working on?
>> All right, so I work on an
experiment that is situated
with the Large Hadron Collider,
so it's a particle accelerator experiments
where we accelerate protons,
which are hydrogen nuclei,
to a very high energy,
so that they almost go
with the speed of light.
And so, we have a large
tunnel underground,
100 meters underground in Geneva,
so straddling the border
of France and Switzerland.
And there, we're accelerating two beams.
One is going clockwise.
The other one is going counterclockwise,
and there, we collide them.
And so, I work on an
experiment that kind of looks
at these collisions and
then analyzes this data.
>> Lukas, if I can,
you know, when you talk
to most companies, you talk about scale,
you talk about latency,
you talk about performance.
Those have real-world
implications for your world.
Do you have anything
you could share there?
>> Yeah, so, one of the main
things that we need to do,
so we collide 40 million
times a second these protons,
and we need to analyze them in real time,
because we cannot write
out all the collision data
to disk because we don't
have enough disk space,
and so we've essentially run 10,000 core
real-time application to
analyze this data in real-time
and see what collisions are
actually most interesting,
and then only those get
written out to disk,
so this is a system that I
work on called The Trigger,
and yeah, that's pretty
dependent on latency.
>> All right, Ricardo, luckily
you know, your job's easy.
We say most people you
need to respond, you know,
to what the business needs
for you and, you know,
don't worry, you can't go
against the laws of physics.
Well, you're working on physics here,
and boy those are some
hefty requirements there.
Talk a little bit about that
dynamic and how your team
has to deal with some
pretty tough challenges.
>> Right, so, as Lukas
was saying, we have this
large amount of data.
The machines can generate
something around the order
of a petabyte a second, and then,
thanks to their hardware-
and software-level
triggers, they will reduce
this to something that is
10 gigabytes a second,
and that's what my side
has to handle.
So, it's still a lot of data.
We are collecting something
like 70 petabytes a year,
and we keep adding, so right now we have,
the amount of storage available is
on the order of 400 petabytes.
We're starting to get
at a pretty large scale.
And then we have to analyze all of this.
So we have one big data center at CERN,
which is 300,000 cores,
or something like this,
around that, but that's not enough,
so what we've done over
the last 15, 20 years,
we've created this large
distributed computing environment
around the world.
We link to many different
institutes and research labs
together, and this doubles our capacity.
So that's our challenge, is
to make sure all the effort
that the physicists put into
building this large machine,
that, in the end, it's
not the computing that is
breaking the world system.
We have to keep up, yup.
>> One thing that I always
find fascinating is people
who are dealing with real problems
that push our conception of
what scale starts to look like,
and when you're talking
about things like a petabyte
a second, that's beyond the comprehension
of what most of us can
wind up talking about.
One problem that I've seen historically
with a number of different
infrastructure approaches
is it requires a fair level of complexity
to go from this problem to
this problem to this problem,
and you have to wind up working
through a bunch of layers
of abstraction, and the end result is,
and at the end of all of
this we can run our blog
that gets eight visits a
day, and that just doesn't
seem to make sense.
Whereas what you're talking about,
that level of complexity
is more than justified.
So my question for you
is, as you start seeing
these things evolve and
looking at other best practices
and guidance from folks who are doing
far less data-intensive applications,
are you seeing that a
lot of the best practices
start to fall down as
you're pushing theoretical
boundaries of scale?
>> Right, that's actually a good point.
Like, the physicists are very
good at getting things done,
and they don't worry that
much about the process,
as long as in the end it works.
But there's always this kind of split
between the physicists and
the more computing engineer
where the practices, we
want to establish practices,
but at the end of the day,
we have a large machine
that has to work, so sometimes
we skip a couple of steps,
but we still need, there's
still quite a lot of control
on like data quality and
the software validation
and all of this.
But yeah, it's a
non-traditional environment
in terms of IT, I would say.
It's much more fast pacing than
most traditional companies.
>> You mentioned you had how many cores
working on these problems on site?
>> So in-house, we have 300,000.
>> If you were to do a full
migration to the public cloud,
you'd almost have to
repurpose that many cores
just to calculating out
the bill at that point.
Just, because all the
different dimensions,
everything winds working on at that scale
becomes almost completely non-trivial.
I don't often say that I'm not
sure public cloud can scale
to the level that someone would need to.
In your case, that becomes
a very real concern.
>> Yeah, so that's one debate
we are having now, and it's,
it has a lot of advantages to
have the computing in-house,
and also because we
pretty much use it 24/7,
it's a very different type of workload.
So we need a lot of resources 24/7,
like even the pricing is kind
of calculated differently.
But the issue we have now
is that the accelerator
will go through a major upgrade
just in five years' time,
where we will increase the
amount of data by 100 times.
Now we are talking about
70 petabytes a year
and we're very soon talking
about like exabytes.
So the amount of
computing we'll need there
is just going to explode,
so we need all the options.
We're looking into GPUs
and machine learning
to change how we do
computing, and we are looking
at any kind of additional
resources we might get,
and there the public cloud
will probably play a role.
>> Could you speak to kind of the dynamic
of how something like an upgrade of that,
you know, how do you work together?
I can't imagine that you
just say, "Well, we built it,
"whatever we needed and
everything, and, you know,
"throw it over the wall
and make sure it works."
>> Right, I mean, so I
work a lot on this boundary
between computing and
physics, and so internally,
I think we also go
through the same processes
as a lot of companies, that
we're trying to educate people
on the physics side how to go
through the best practices,
because it's also important.
So one thing I stressed
also in the keynote
is this idea of
reproducibility and reusability
of scientific software
is pretty important,
so we teach people to
containerize their applications
and then make them reusable
and stuff like that, yup.
>> Anything about that
relationship you can expound on?
>> Yeah, so like this
keynote we had yesterday
is a perfect example of how
this is improving a lot at CERN.
We were actually using data from CMS,
which was one of the experiments.
Lukas is a physicist in ATLAS,
which is like a computing
experiment, kind of.
I'm in IT, and like all this
containerized infrastructure
kind of is getting us all
together because computing
is getting much easier
in terms of how to share
pieces of software and
even infrastructure,
and this helps us a lot internally also.
>> So what particular about Kubernetes
helps your environment?
You talk for 15 years that you've been
on this distributed systems build-out,
so sounds like you were the hipsters
when it came to some of these solutions
we're working on today.
>> That has been like a major change.
Lukas mentioned the container part
for the software reproducibility,
but I have been working
on the infrastructure for,
I joined CERN as a student
and I've been working
on the distributed
infrastructure for many years,
and we basically had
to write our own tools,
like storage systems, all the
batch systems, over the years,
and suddenly with this
public cloud explosion
and open source usage, we can
just go and join communities
that have requirements sometimes
that are higher than ours
and we can focus really on
the application development.
If we base, if we start writing
software using Kubernetes,
then not only we get this
flexibility of choosing
different public clouds or
different infrastructures,
but also we don't have to care so much
about the core infrastructure,
all the monitoring,
log collection, restarting.
Kubernetes is very important
for us in this respect.
We kind of remove a lot of the software
we were depending on for many years.
>> So these days, as you
look at this build-out
and what you're looking, not
just what you're doing today
but what you're looking to
build in the upcoming years,
are you viewing containers
as the fundamental primitive
of what empowers this?
Are you looking at virtual
machines as that primitive?
Are you looking at functions?
Where exactly do you draw
the abstraction layer,
as you start building this architecture?
>> So, yeah, traditionally
we've been using
virtual machines for like the
last maybe 10 years almost,
or, I don't know, eight years at least,
and we see containerization
happening very quickly,
and maybe Lukas can say a
bit more about the physics,
how this is important on the physics side?
>> Yeah, what's been, so
currently I think we are looking
at containers for the main abstraction
because it's also we
go through things like
functions as a service.
What's kind of special about
scientific applications
is that we don't usually just
have our entire code base
on one software stack, right?
It's not like we would
deploy Node.js application
or Python stack and that's it.
And so, sometimes you have
a complete mix between C++,
Python, Fortran, and all that stuff.
So this idea that we can build
the entire software stack
as we want it is pretty important.
So even for functions as a
service where, traditionally,
you had just a limited choice of runtimes,
this becomes important.
>> Like, from our side,
the virtual machines still
had a very complex setup
to be able to support
all this diversity of software
and the containerization,
just all the people have to give us
is like run this building block
and it's kind of a standard interface,
so we only have to
build the infrastructure
to be able to handle these pieces.
>> Well, I don't think anyone
can dispute that you folks
are experts in taking larger
things and breaking them down
into constituent components thereof.
I mean, you are, quite obviously,
the leading world experts on that.
But was there any challenge to you
as you went through that process of,
I don't necessarily even
want to say modernizing,
but in changing your
viewpoint of those primitives
as you've evolved, have you
seen that there were challenges
in gaining buy-in
throughout the organization?
Was there pushback?
Was it culturally painful
to wind up moving away
from the virtual machine approach
into a containerized world?
>> Right, so yeah, a bit, of course.
But traditionally we, like
physicists really focus
on their end goal.
We often say that we
don't count how many cores
or whatever, we care
about events per second,
how many events we can process per second.
So, it's a kind of more
open-minded community maybe
than traditional IT, so
we don't care so much
about which technology
we use at some point,
as long as the job gets done.
So, yeah, there's a bit
of traction sometimes,
but there's also a push
when you can demonstrate
that we get a clear benefit,
then it's kind of easier
to push it.
>> What's a little bit special maybe also
for particle physics is
that it's not only CERN
that is the researcher.
We are an international collaboration
of many, many institutes
all around the world
that work on the same project,
which is just hosted at CERN,
and so it's a very flat hierarchy
and people do have the
freedom to try out things
and so it's not like we
have a top-down mandate
what technology we use.
And then somebody tries something out.
If it works and people see a value in it
then you get adoption from it.
>> The collaboration with the data volumes
you're talking about as
well has got to be intense.
I think you're a little bit beyond the,
okay, we ran the experiment,
we put the data in Dropbox,
go ahead and download it, you'll get that
in only 18 short years.
It seems like there's
absolutely a challenge in that.
>> That was one of the key
points actually in the keynote
is that, so a lot of
the experiments at CERN
have an open data policy
where we release our data,
and so that's great because
we think it's important
for open science, but it was
always a bit of an issue,
like who can actually
practically analyze this data
for people who don't have a data center?
And so one part of the keynote
was that we could demonstrate
that using Kubernetes and
public cloud infrastructure
actually becomes possible for
people who don't work at CERN
to analyze this large-scale
scientific data sets.
>> Yeah, I mean maybe
just for our audience,
the punchline is
rediscovering the Higgs boson
in the public cloud.
Maybe just give our audience
a little bit of taste of that.
>> Right, yeah, so
basically what we did is,
so the Higgs boson was discovered in 2012
by both ATLAS and CMS,
and a part of that data,
we used open data from
CMS and part of that data
has now been released publicly,
and basically this was
a 70-terabyte data set
which we, thanks to our
Google Cloud partners,
could put onto public cloud
infrastructure and then we
analyzed it on a large-scale
Kubernetes cluster, and--
>> The main challenge there
was that, like, we publish it
and we say you probably
need a month to process it,
but we had like 20 minutes on the keynote,
so we kind of needed a bit
larger infrastructure than usual
to run it down to five minutes or less.
In the end, it all worked out,
but that was a bit of a challenge.
>> How are you approaching, I guess,
making this more
accessible to more people?
By which I mean, not just
other research institutions
scattered around the world, but students,
individual students, sometimes
in emerging economies,
where they don't have access
to the kinds of resources
that many of us take for
granted, particularly work
for a prestigious research institutions?
What are you doing to
make this more accessible
to high school kids, for example,
folks who are just dipping their toes
into a world they find fascinating?
>> We have entire
programs, outreach programs
that go to high schools.
I've been doing this when
I was a student in Germany.
We would go to high schools
and we would host workshops
and people would analyze a
lot of this data themselves
on their computers.
So we would come with a USB
stick that have data on them,
and they could analyze it.
And so part of also the open data strategy
from ATLAS is to use that open data
for educational purposes.
And then there are also
programs in emerging countries.
>> Lukas and Ricardo, really
appreciate you sharing
the open data, open science mission
that you have with our audience.
Thank you so much for joining us.
>> Thank you.
>> Thank you.
>> All right, for Corey
Quinn, I'm Stu Miniman.
We're in day two of two
days live coverage here
at KubeCon + CloudNativeCon 2019.
Thank you for watching theCUBE.
(upbeat music)
