>> Announcer: Live from Orlando, Florida,
it's theCUBE
covering PentahoWorld 2017.
Brought to you by Hitachi Vantara.
>> Welcome back to theCUBE's live coverage
of PentahoWorld brought
to you by Hitachi Vantara.
I'm your host Rebecca Knight,
along with my cohost Dave Vellante.
We are joined by Derek Mathieson,
he is the group leader at CERN.
Welcome, Derek, glad to
have you on the show.
>> Well, glad to be here,
thank you very much.
>> So, CERN, which is of course
the European Organization
for Nuclear Research.
And you know we think
of it as this place of physicists
and engineers working together
to solve these problems.
And probe the mysteries of
the universe but in fact,
CERN is a technology organization.
>> Absolutely, I mean, I think that's the-
CERN has this reputation of
being exclusively physics.
I mean, it is the world leading
particle physics laboratory.
But in fact, in the end,
yeah, we're an infrastructure organization
who provides all the
technology, all the science.
And all the scientists and engineers come
to CERN to do their work.
But CERN itself provides the facilities.
So, our main focus, in
fact, is technology.
Computer science, civil
engineering, construction.
I mean, we built cathedral
size concrete structures
400 and 50 feet underground,
17 mile long tunnels.
I mean, this is civil
engineering in the grand scale.
And that's actually one
of the major focuses.
Is that CERN, although it's
a physics organization,
one of the difficulties we
have as an organization is
to explain to people, in fact,
what we're looking for
when we're recruiting.
When we're contacting other universities.
It's all about the fact that we're
not looking for physicists,
we're looking for engineers
and technology specialists
to come and work at CERN.
>> So talk to us about some of the new,
exciting projects that
you're working on there.
>> Oh, I mean, there's a lot going on.
Obviously, the reason I'm here
today is all about the work
that we're doing with Pentaho.
So we're, you know, building
a new data warehouse.
My group's actually responsible
for the administrative computing of CERN.
So basically running CERN as a business.
I mean this is, there's a budget of
around about one billion U.S. dollars.
Going into CERN every year,
in order to do all this physics research.
So obviously we have a
responsibility to treat,
be faithfully to these tax dollars,
carefully and you know spend them wisely.
So a lot of my work is to make sure
that we have the
appropriate infrastructure,
controls and proper technology there.
To make sure that it's used
effectively and wisely.
>> So paint a picture
of that infrastructure
for us, if you would.
What's it look like if we
took a peak under the tent?
Well, I mean, it's what
quite nice about it is
with the technology
infrastructure that we have.
So we have a huge computer center.
There's a hundred thousand
CPU's in our computer center.
That's mainly used for doing physics
but because we have all
this infrastructure there,
we can use part of it to
also run the administration.
Which gives us the ability to run a
real world class technology stack
to actually run the organization.
So we have a huge data warehouse.
Which gives a very rapid
response to the physicists
and engineers who actually want
to go on and do their work.
My job is to make sure
that the administration
of CERN doesn't get in their way.
So we want to provide them the
facilities so they just get
on with their job and all
the other things to do
with actually running the
organization are my problem
and the team that works for me.
And good examples is
that CERN literally sits
on the border between
France and Switzerland.
So we have, you know, we
care about things like,
there's 80 different
customs forms that we have
to worry about on a daily basis just
as we move materials around the site.
So we have such an usual organization
but it's unique in the world.
And that's what attracts
people to work there
is all these new challenges that we got.
It's really a fantastic place.
>> And the view is pleasant I bet.
>> Oh yeah.
(all giggling)
>> Okay, so tell us more
about the infrastructure.
So you talked about this
really fast data warehouse.
100,000 CPUs,
is it all sort of on prem?
Is it a mix sort of on prem and the Cloud?
What's the data warehouse, you know,
give us a sense of what
that infrastructure is.
'Cause people hear data
warehouse, they think you know,
kind of old, clunky data warehouse.
You're talking about this
super high performance.
>> Exactly, in fact, that's
one of the challenges
that we face is.
We've got scientists
who are used to dealing
with high volumes of
data with high fixation.
Our particle detectors produce around
2 petabytes of data per second.
So they're used to dealing
with large amount of data.
So immediately when they started looking
at the administration of the organization
of the same high expectations.
They want it to be fast, they
want it to process the data.
Large quantities of
data, very quickly indeed
and give the answers
(snaps) in a split second.
So to do that we have to
obviously put quite a lot
of hardware behind it and also use
good technical strength as well.
We're quite big users of Oracle at CERN.
We have a big Oracle database
which is for the principle,
where we keep most of our data.
And then we use Pentaho
on top of that in order
to do all the deporting, the analytics,
the building the Cube, so
all this kind of thing.
And their user base is very transient.
So there's around fifteen thousand people
who're actually working
at CERN at any one time.
Half of the world's particle
physicists work at CERN.
>> Rebecca: Wow.
>> So, they're coming
and going all the time.
They don't want to worry
about how to get the data.
So it has to be there, has
to be there right away.
Has to be easy to use
and easy to understand.
These people live and work
and breathe particle physics.
They don't worry about
the budget and the details
about how to do all this stuff.
This is something where the
accountants have to get there.
Get it in such a way
that it's easy for them
to do the right thing and make sure
that we stay compliance with
the various regulations.
And make sure that the
organization continues to function
as a business while still getting on
with our primary mission of
particle physics research.
>> And that infrastructure is primarily
on premise, that correct?
>> It's on premise, the
vast majority of it.
In fact, one of the, we
have two main data centers.
So there's one physically
located at Cern in Geneva.
And then there's another one over
in the (mumbles) institute,
in (snaps)
>> The other place.
>> The other place.
(both laughing)
>> Okay.
>> Yep.
>> And that, presume, because you've got
such volumes of data.
You can't just be moving that stuff
around up into the Cloud.
>> Right, in fact yeah, we have a lot
of high speed data links between the
different data centers in order to.
We have a copy of quite a
lot of the data in fact.
The principle physics data
is copied, not only at CERN,
which is what's called a 2-0 site where we
have all the data to start with.
But we also copy it to I think it's around
about seven different
institutes around the world.
So they have a first-line copy as well.
Altogether we have a network of around
a hundred computer
centers working for CERN
in some way or other.
That's part of what we call
the LHC computing grids
which is (mumbles) a planetary data center
in computer infrastructure
to do all this processing
of the LHC data.
>> I'm going to ask you to go back to
about the organizational structure.
I mean, you described this office situated
on the border of France and Switzerland.
Where half the world's
particle physicists work.
What is the culture like?
And how do you get-
and as you said also the
administrations job is
to really get out of their way
so they can do their thing.
What is the culture like there?
How do people work together?
How do people collaborate?
What do you do when there's disagreement?
>> I mean this is one of
the unique aspects of CERN.
Is bringing people together.
There's around about 90 different
countries represented at CERN.
Around about 100 different nationalities,
all working on site.
It's very much like a
university environment.
We have a canteen where
people will come in.
Their always saying that
probably most of the physics
and most of the science
discoveries are happening
within the canteen as people meet together
from all over the world.
We have countries, India, Pakistan,
have just joined as associate members.
We've got 22 member states.
Mainly around Europe but now
we have a policy enlargement.
So we're actually trying to make
the organization even larger.
Touching more countries around the world.
United States is an observer
now within the organization.
So they actually participate
in the CERN council
and they're also major players in some
of the large LHC experiments as well.
But yeah, on a day to day basis,
I'll be sitting in the restaurant
and there will be Nobel Prize winners.
We have our director general,
she will be there as well,
having lunch with everyone else.
So it's a very much a
leveling organization
where everyone feels free
to speak to each other.
And discuss the matters of
the day and particle physics.
>> So what do you guys talk about?
>> (laughs) What's the
canteen conversation?
>> I think this is the
utter geek speak usually.
That's the main problem in CERN is that
people are passionate about what they do.
So they come to CERN,
they love what they do,
they talk about it all the time.
So, I mean, people will be talking
about the latest generation
of the CPU architecture, GPU programming.
How do we do simulations
with petabytes of data?
This is lunch time conversation.
And evening and everything else.
>> So you're not talking about
the a football game, right?
You're talking about this sort of,
talking shop mostly right?
>> There is a football team,
there is a rugby team as well.
There's real life as well at CERN
but yeah, I mean, most people are there
because they're passionate
about what they do.
>> Obviously you're listening to those
conversations you must
pick up a lot of it.
>> Yeah, I know, I mean,
I think it's if you work
at Cern and you're at a dinner party,
someone laughs, "Oh you work at Cern,
tell me all about physics."
So you pick up a bit about it of course.
Everyone can speak a little bit
about what we're doing at
Cern and I think that's
an imperative because we work there.
Of course you hear about what's going on
and understand a little bit about it.
But I would never claim to
be a physicist of course.
>> Rebecca: You can fake it though.
>> I have lunch with
physicists, I'm not one myself.
>> How 'about Pentaho?
You painted the picture of
the infrastructure before.
Where does Pentaho fit?
And how are they adding value?
>> We've been using Pentaho
now for the last few years.
We started, I mean, what
really attracted is actually
this combination of open-source
plus propriety software.
We like the core and the
open-source nature of it
which it very much fits with the values
of CERN as well as being an open lab.
And sharing everything that we do.
So we started, as I say,
with Pentaho a few years ago.
Now, it's a core component.
It's a core strategic
component of the administration
and also used in other areas as well.
So it's also used in some
of the more technical
infrastructure areas in terms of:
how do we actually run the lab?
Parts of the infrastructure in terms
of monitoring the different parts
of the accelerator complex.
And even in terms of, you know,
the maintenance of the
buildings, all of that.
So it's really, you know,
core within the organization
as a core component for us.
>> So, CERN is an organization then as-
I'll use the word insistent,
if you will, on open-source
as a component.
So that puts pressure on
companies like Pentaho
to pay attention to the next project.
Maybe contribute, maybe not.
But it certainly integrate.
Score card, how have they done on that?
What would you like to see
them do better in that regard?
And what kind of
open-source projects do you-
and you may not be able to answer this.
But, might your organizations
see in the horizon
that you want Pentaho to capture?
I mean, obviously 8.0, you've heard about,
Spark and bringing in Kafka
and the like.
But maybe you could comment.
>> Absolutely, I think
this is one of the eighters
who's really attracted us
was the open-source nature.
And certainly Pentaho's
movement in that direction
particularly, I think, was the integration
with Hitachi as well.
They're seeing many other
projects now being integrated
within to that sort of pentacle world.
This is something that
was interesting to us.
Of course because of our
Cloud based infrastructure.
The idea of scaling up and scaling out.
And they're going with
the open-source projects
to particular and the patchy projects.
Which was really
interesting to us as well.
Something that we've been
working on a bit ourselves.
And now to hear that Pentaho
was doing that as well.
That was great, a good
piece of news for me
because it was something
that we have been struggling
with is basically spreading out.
We've got fifteen thousand users.
We want to have a dynamic infrastructure
where we can actually
provision more service
where necessary in order
be able to take load
when we need it.
But at the same time we don't
want to waste the resources
when they're off doing something else.
>> Over the course of
last decade, let's say,
has there ever been a
tendency for-
'cause you've got so many
alpha geeks running around.
To say, "Hey, I can take
these open-source components
and kind of do it myself."
>> Derek: Yeah.
>> "I don't need the Pentaho load bouncer,
I got yarn to negotiate my resources.
Look what I built."
And so, how do you manage that?
>> No, I mean, you're absolutely right.
It's a problem here there's
always the risk of the 'not
invented here' syndrome where,
"I could do it better."
And we have to pressure against that.
But, I mean, I think the important
of the issue is take the bigger picture.
If it's already done well,
we don't need to do it again.
Build on top of it, make something better
on top of something that already exists.
And that's the thing, that's
the message that we can give
to any of the engineers working at CERN.
Is, "You can do so much
more if you already
use the infrastructure
that's already solid."
And that's part of this,
you know, reuse, of course.
Open-source software allows us to build
on things which are already solid.
We don't need to make another one of them.
We'll make something on top of it.
That's a primary message
that we try to give.
>> So here we are at Pentaho World
and you're with a bunch
of other practitioners.
Sharing best practices,
talking about how you use the product,
learning from them too.
What are some of the take aways?
And how much are you actually talking
to them versus talking to
the Pentaho product people?
>> We did a presentation yesterday.
The focus of our presentation
was managing Pentaho.
So, one of the things
that we've been using now
for a number of years is you
have to have an infrastructure
to be able to actually take care
of all the different artifacts,
all the different reports.
We have many, many different
user who want to be able to
use Pentaho at the same time
creating their own artifacts.
I mean we have to have
some way of managing
to actually manage all this landscape.
Although Pentaho has got
some tools necessary,
that was one of the areas
that we felt we could
add some value in there.
So we've been building on top
of the existing Pentaho APIs.
Building an infrastructure
to make it easier
to support for other people.
And what was quite nice
is we were speaking
to some of the other attendees.
And that's exactly the
kind of thing they've
been worrying about as well.
And there was even some presentations
of people doing a similar approach
in their own organizations.
On how they were actually
trying to build some kind
of architecture on top of Pentaho just
to manage the whole thing.
When you have hundred
of reports and hundred
of artifacts and very complicated
data warehouse cubes,
you need something on top of that
to actually just manage the whole thing.
And that's something that
we've been focused on.
And I see other people are
doing the same kind of thing.
So I can imagine that
Pentaho will be taking note
of this and probable
incorporating some of the ideas.
>> It's sending a loud and
clear message to Pentaho,
yes absolutely.
>> How about the event?
You've been to at least
two or that I know of.
I don't know if you were at the original.
>> I've been to three altogether.
>> Okay, so you've been to,
I think all of them, right?
>> I could have been all of them, yeah.
>> I think the first one was
14, I think, I'm pretty sure.
Things you've taken away?
You know, interesting conversations?
>> I think it's the
main reason we come in.
It's a long way for us to come all the way
from Geneva to come here.
It's really important for us to touch base
with other people using the product.
It is an open community, people do like
to talk to each other about,
you know the new things that are happening
within the Pentaho community.
And I think face to face
contact, in the end,
is very hard to beat.
And we're coming to an
event like this you actually
get the opportunity to
speak to people over lunch.
Or in the evening events
you can talk to them
and actually find out what it's
really like to use Pentaho.
>> Great, well thank you so much Derek
for coming on theCUBE.
>> Thank you very much.
>> I'm Rebecca Knight for Dave Vellante.
We well have more from
Pentaho World just after this.
