If you’ve watched a video or two of mine,
by now, you’ll have recognized that I am
a physicist.
And not just any kind of physicist.
I’m a particle physicist, which means that
I get all excited about quarks and leptons
and big, monster particle accelerators.
I mean- this stuff is totally cool.
By processing an enormous amount of data taken
by the LHC, my colleagues and I can learn
a great deal about the laws that govern the
universe.
That phrase “processing the data” is not
something easily accomplished.
To do that requires tremendous computer resources
and that’s the point of today’s video.
So how much data are we talking about?
Just how much information is recorded by an
LHC experiment?
I could make this a short video and just say
twenty petabytes a year.
But if you’re not a computer wonk, you probably
won’t know what that means.
So let’s get some context.
Computer information is stored in a series
of ones and zeros.
Eight ones and zeros is called a byte.
After that, we use the metric system to name
larger and larger sets of data.
A kilobyte is a thousand bytes of information,
a megabyte is a million and a gigabyte is
a billion.
And a gigabyte is already a lot of information.
A gigabyte is seven minutes of HD-TV.
Two gigabytes is the information stored in
a shelf of books 60 feet long.
And a standard DVD can hold 5 gigabytes.
However, gigabytes are small potatoes in the
LHC world.
A terabyte is a trillion bytes and a petabyte
is a quadrillion bytes.
In other words, a petabyte is a million gigabytes.
And a petabyte is the most relevant unit for
the LHC data.
So how big is a petabyte?
Suppose that we represent a single byte by
a floor tile that is half a square meter. 
That’s a square 70 centimeters on a side
or a little over two feet square for my American
viewers.
 
A kilobyte is then 500 square meters, which
is an eighth of an acre, or half the size
of the size of the parcel of land your house
sits on if you’re a typical American suburban
homeowner with a quarter-acre lot.
 
A megabyte is much bigger and corresponds
to the size of the Pentagon if you include
the parking lots. 
That’s half a square kilometer. 
It’s also about the size of Vatican City.
 
A gigabyte is a thousand times bigger still
and is the size of Tulsa, Oklahoma, a fine
town if ever there was one, and birthplace
of Route 66.
It has an area of about 500 square kilometers.
 
A terabyte is equivalent to half a million
square kilometers. 
That’s about the same as the combined area
of four U.S. states: Illinois, home of Fermilab,
my favorite laboratory, plus Indiana, Wisconsin
and Ohio.
If you’d like to imagine a singe country
instead, that’s the area of Thailand.
 
But to get a petabyte, this is represented
by half a billion square kilometers and, for
that, you need the surface of the entire Earth.
 
I hope this cements just how big a petabyte
is. 
If a byte is as big as a floor tile, a petabyte
is the surface of the entire planet. 
And remember that the LHC experiments record
lots of petabytes per year, and that is a
ton of data.
CERN is ready for this enormous amount of
data.
Combined with the Wigner data center in Budapest,
Hungary, CERN has available 150 petabytes
of disk storage.
That’s enough to store over a thousand years
of HD movies.
The CERN computing facility can absorb up
to 10 gigabytes a second.
And each year, the LHC experiments generate
over 50 petabytes of data that is stored to
tape.
So I’ve just been talking about storage
capacity, but you also need computers to crunch
the data.
For the CMS experiment, we’re talking about
100,000 independent CPU cores, spread across
the globe in a giant network called the Grid.
The Grid consists of over sixty independent
computer centers, distributed across the world.
If you try to run a computer program that
analyzes LHC data, the system scours the world
for un-utilized computers and runs your program
on the distant computer.
When the computer is finished, it ships back
the result to you.
If you’re going to be shipping data all
across the world, you need excellent connectivity.
You really do need primo networks.
To give you a sense of scale, if you needed
to send a petabyte of data from Europe to
the US using DSL, it would take ten years.
Even using the network cable connection you
might have to your house would take eight
months.
However, using the state-of-the-art Transatlantic
links that run at 340 billion bits per second,
we can transfer a petabyte of data in under
seven hours.
That’s smokin’.
When you get right down to it, the discoveries
of the LHC rely crucially on computing systems
around the world.
And that trend will continue.
So the next time you feel the need to swear
at the network responsivity of your home computer,
keep in mind the problems of the LHC computer
professionals.
Your problems could be way worse.
