Welcome to BIG SCIENCE, a programme that
we film here at CERN, just outside Geneva.
BIG SCIENCE is a programme where we talk
about all aspects of science, not just about particles
or the laws that govern the Universe. We will talk about
the science all around us, the science of everyday life.
In this episode we will also give you the chance to go
behind the scenes at CERN, to see what CERN
and its buildings, its corridors and
its offices look like from the inside.
CERN is huge - 200 hectares, or the
equivalent of 286 football pitches.
Today we will show you the Data Centre.
But to start with, we’re going to talk about Big Data.
You might have noticed that we often have
trouble explaining what Big Data is. But Big Data
has been part of our lives for a while now.
Explanations:
What is Big Data?
Big Data is not new, it’s simply the
collection and analysis of large amounts of data.
Images, e-mails, GPS data, numbers, archives of all kinds...
we produce lots of data, huge
amounts of data that we need to store.
The problem is that the volume of data is
constantly increasing. In the 1960s, CERN’s data
was already being analysed by a computer,
but it was a very big computer that filled an entire building.
Physicists came to CERN from all over the world
to analyse the data. Time went by.
The data collected continued to grow.
And then we found a way to share the data:
computer networks, in particular the Internet.
With the invention of the World Wide Web at CERN in 1989,
it became very easy to connect computers to each other.
It was a revolution in how we communicate.
In the 2000s, there was so much data that it was no longer
possible to analyse it all at CERN,
even with buildings full of computers.
So the data was distributed all over the world.
The computers at hundreds of different institutes were used
and it was no longer necessary to know where
the data was stored in order to have access to it.
This is called GRID COMPUTING.
Today, Big Data is used by everyone, not just scientists.
For example, we have access to traffic information, weather 
forecasts, flu outbreak predictions – all thanks to
the collection of large amounts of data.
Maïté Barroso Lopez, you are the deputy head
of the Information Technology department,
also known as IT – we often call it that in French too.
So you know all about Big Data. Since the invention of the World Wide Web here 30 years ago,
what has been the biggest challenge,
the biggest problem you’ve had to deal with?
The biggest challenge we have is being inundated
with data, the huge quantities of data produced
by the accelerator, the LHC,
which is working really well, even better than
expected, producing 600 000 000 connections per second,
which is the equivalent of one DVD
per second arriving at CERN’s Data Centre.
- Do you keep all the data?
No. We don’t keep all the data. That one DVD per second
is 1% of all of the data, we filter it and...
- So the filtered data still amounts to one DVD per second?
- Yes, per second.
- Every day?
- Every day of the year, day and night, all the time,
one million episodes of your favourite TV series per year.
- And every year you have to keep storing
the data so that physicists...
All the data needs to be stored and we also need
the computing power to allow 10,000 or so physicists
all over the world to analyse the data.
So first you have to filter and store the data,
then provide the computing power.
What do you do with the data you collected a few years ago?
The LHC has been operating for a few years.
How are you able to preserve all this data?
You have to have the right tools.
- Yes, we need the right tools, we keep all the data that
the accelerator has produced, even the previous accelerator
that we used to have. We keep
all the data forever in digital form.
Before, when we used to write everything down, it was easy...
- We had books and everyone knew how to read.
- Exactly, we had books, but for digital,
we not only have to keep the data but also the tools
to read the data, to read the formats that the data is in.
I’ll give you an example from my own life:
I did a project at the end of my studies
and it was all stored on a floppy disk at my parents’ house.
Even though my parents’ computer is very old,
we can’t read the floppy disk, we don’t
have the equipment. I wanted to show my kids and it then
I realised that it couldn’t be read, it was lost. So we not only
have to store the data, but also the technology,
and tools to read it, and this is something
we are working on because we need
to keep the data forever.
- So now we are going to see where the data is stored,
in the Data Centre.
On BIG SCIENCE we’re taking you behind the scenes at CERN:
in each programme we discover a new location and right
now we are going to see the place where all this data is stored.
And there’s a lot of it.
The building just behind me is the CERN Data Centre.
It was built in the 1970s
and the sound you can hear is the fans that cool
the computers. It also has an extension,
located in Budapest in Hungary,
but together they form a single data centre.
But what exactly are they doing in there?
And what does it look like inside
Come with me and I’ll show you.
. Look at this, it’s a computer just like the one you have
at home, and what you see here is 200 000 processor cores,
200 000 computer brains, if you prefer.
This is where all of the LHC data is stored
- the LHC is CERN’s particle accelerator.
They are stored either on magnetic tapes,
like the ones this robot is moving around, or on disks.
In 2015 – just that year alone – 40 million gigabytes
were stored: the equivalent of almost 10 million DVDs.
This data can be processed by physicists
all over the world for their research. They access it
via the high-speed network installed here at CERN.
In addition, the whole region benefits as this is one
of the connection nodes, just behind me,
for the Internet for the whole region.
But the Data Centre is also the starting point
for the computing grid.
This is a planet-sized computer, computing
and storage power spread across 170 centres
in 42 countries worldwide. 2 million calculations are carried
out every day. This is what is needed to analyse and store
the data produced by CERN’s particle accelerator,
the LHC. And only 20%
of that computing power is located here.
Can you hear that? It’s noisy.
- Hello Einar.
- Hello.
You are the director of UNOSAT,
a United Nations programme that has been based
at CERN for 15 years. But why at CERN?
At CERN they have the benefit of a large IT infrastructure.
The storage volume they have here is huge and they have access to everything: infrastructure, information, computers...
- They have the experience of the LHC.
- Exactly, they have the experience
and we are lucky to be able to take advantage of it.
- So you often work with Maïté, whom we just saw?
- Almost every day.
So what exactly do you do? UNOSAT has
something to do with satellites, doesn’t it?
Yes, exactly. UNOSAT is part of the United Nations Institute
for Training and Research
and we use satellite images to help
in the event of humanitarian crises,
after natural disasters for example. We also use
the same technologies for Human Rights applications
and for development.
So let’s say there’s an event, a flood.
The government of Pakistan, for example
– we’re looking at images of that country
behind you – calls you and asks for photos.
Yes, as soon as we get the call we ask
the operator of the satellite,
which is constantly going around the world,
to take photos as quickly as possible.
- How fast can it be done?
- Typically, if we call in the afternoon from here in Geneva
we have the photos by the following morning.
- It’s not any quicker than that?
- No, that’s the way it is.
- You have to wait
wait for the satellite to go around the world?
- For the satellite to get into position over the flooded area.
- When you receive the image, what do you do with it?
- Straight away we start to analyse the images.
We download them here at CERN,
analyse them and extract the information
that’s important for the aid workers.
- So you can see the number of people affected,
the roads affected,
that sort of thing?
- In combination with the other information we have, yes.
We extract the flooded area, for example,
and we already have the road network
in our database here at CERN, so if we combine
the road network with the flooded areas,
we can see which roads are flooded and not accessible.
- So you can say to aid workers on the ground,
don’t take that road, you won’t be able to access that village?
- Yes, and if we combine information
on population distribution with the flooded area,
we know straight away how many people
are directly affected by the flooding.
- And then you distribute all the data. But at the same
time, you’re also using CERN’s networks
to distribute the data to all the aid workers
at the same time, to all the people who need it?
Exactly, the transfer power we have access to here
at the Data Centre is very important because
we can send information to all the players
at the same time – it’s also a way to help
with the coordination of these efforts.
- Thank you very much for all your explanations.
I don’t think we’ll look at the figures we see
in the newspapers in the same way any more, when they say that 10 000, 13 000 or 100 000
people are thought to have been affected affected
by a natural disaster. You’re the ones behind those figures.
- Thank you.
- thank you.
That’s all from BIG SCIENCE for today. I hope that we’ve
helped you to learn a little bit about Big Data.
We’ll see you again very soon.
