So this virtual... this kind of base concept of 
virtualization 
has then introduced us into the Cloud. 
So this Cloud buzzword that's out there has a 
lot of 
different meanings, and depending on how far in 
the stack
could mean these things... yeah, is there a 
question?
(new speaker)
Excuse me, we have a question.
(Jonathan Klinginsmith)
Yeah, go ahead.
(new speaker)
Yeah, in your previous slide... 
[pause]
on a physical machine you used the local hard 
drive 
for the Hadoop Distributed File System, so when 
you have 
your virtual machine are you mounting the 
local... partition
on the local hard drive to have your Hadoop file 
system on, then?
(Jonathan Klinginsmith)
So if you look at... and we'll get into a little bit
more with some of the Clouds, but... there's 
capabilities out there
in what's called elastic box storage where you 
can actually
mount... this box storage that then makes it 
persistent. 
So if you're not worried about persistent storage 
and you just want to install on the ephemeral 
storage, 
you can install locally on that instant storage 
that's available
on those devices. But if you actually want to 
have persistent data, 
typically the best practice is to install that on 
block storage that's...
that you attach to a running instance. Does that 
answer your question?
[pause]
So, and we touched on this a little bit, so I'll kind 
of go through this, 
but the Cloud was kind of introduced in this 
model 
in where it's attractive both for academics and 
for industry is,
it provides some nice properties for us to be 
able to look at challenging large problems. 
So we've talked a little bit about virtualization, 
everyone's fully aware 
of the Internet, and both of those kind of brought 
together brings us
to this concept of the quote-unquote Cloud, this 
buzzword.
And so some of these properties that become 
interesting for the Cloud 
are scalability, this ability now to go into a data 
center, 
multiple data centers, and we know this for a 
fact from Clouds
like Amazon, they've got data centers in 
Northern Virginia,
they've got... data centers in Oregon and Northern California, 
they've got in Europe, South America, the Far 
East. 
So all of a sudden now I've got data centers 
and I've got capacity that maybe I'm not used to.
And I guess one of the benefits that we have here in America 
is we have these nice campus clusters and we have this campus
grid environment that's been set up to provide us with a nice scale. 
But that's not necessarily the case in industry.
In industry typically people aren't collaborating 
with other industry... institutions. So for example,
you could... you can pick something like the pharmaceutical environment, 
pharmaceutical industry, and typically a company's not 
competing... or collaborating with its competitors to say, 
"Hey, go ahead and use my cluster." And that's obviously 
a lot bigger sharing environment that we have here in academics.
So all of a sudden places in industry will start using
the Cloud because they don't have the scale internally
that they can get from a place like an Amazon data center. 
So... and I'll touch on that a little bit more here in a couple minutes.
So the other thing with this is elasticity and 
utility computing.
Utility computing is a really interesting one in this place
because when you get into finances of actually 
purchasing hardware versus running it on hardware that you're
renting by the hour, those are different types of expenses. 
And that may not mean much to you all, but when you start to work 
with finance departments on actually do I want to do a capital expenditure, 
or do I want to rent something by the hour, those dollars are different. 
And so this concept of utility computing allows me to get  
literally thousands of cores that I could pay for with a credit card. 
That's a totally different model than all of a sudden
I'm gonna build a supercomputer in my data center
for millions of dollars. So I'll use somebody else's infrastructure,
I'll use their hardware, and I'll build a cluster on top of it, 
and I'll pay the thousand dollars per hour to use it. 
Now I may lose some performance benefits from that, 
but the fact is I didn't have to go through all the networking, 
all the drive replacements, all the details that go in behind the scenes
that people aren't maybe fully aware of that happen
in the data centers that we see to be able to use 
these computing environments that are available to us. 
So from this kind of high level of virtualization in the Internet
and this Cloud topic, we have three levels of service
that have kind of been defined out there for the Cloud. 
And I'll... and I'm... give some examples here. So this top level one
that people kind of call is like Software as a Service. 
So people out there interacting with your Gmails,
your other web-based emails and that. 
You've got these properties of scalability, elasticity, 
but you're using kind of a software interface into their Cloud.
We don't know the details of the computers that are 
behind the scene when you go to check into your Gmail account.
I just assume I've got elasticity, I've got scalability,
and sometimes actually it's free for me, I'm not paying 
for a per-hour unless I go over whatever they're up to now,
5 Gigs, I don't remember what the size of Gmail is, 
the maximum size is now, but you've got this model of,
I'm not paying for the hardware underneath that, 
I'm just paying for the utility of using that service. 
The next one under that is called Platform as a Service.
and this Platform as a Service says I'm gonna kinda 
give you a framework that you can kinda put your software into.
So if you kind of program to our standards, we'll let you go into our platform, 
and so Google App Engine's an example of that. I can actually
create a Jango app, or I can create another Web-based app
and plug into their application engine to do my work. 
The lowest level of this is called Infrastructure as a Service,
and the reason why it's called Infrastructure as a Service is because
you literally have the nuts and bolts available to you 
to create a virtual machine, how big do you want it?
Create lock storage, how much storage space do you want? 
Attaching nodes, what device do you want that to be on, 
and what should be your mount point? All these low level details
for in this case building out a virtual cluster for Hadoop. 
So these are very technical... details, and 
maybe... I need to know the technical details.
Maybe I can go high enough up in the stack that says, 
"Hey, I'm just a web developer, all I want to do is develop
an application into your platform, and so maybe Google App Engines
is an interesting one that I want to play in this space. 
So you have these variety of levels that you're dealing with,
and there's complexity as you work your way down that stack,
you have to know the details, and so these are why sometimes people 
having more of a Computer Science background, or more of a systems, 
may be more interested in Infrastructure than saying,
"Look, literally I just want to use the offering that you're 
providing at Salesforce.com to deal with my sales staff,
and take advantage of the Cloud that you have behind that." 
