(introductory music)
Hello and welcome.
Please enjoy this short readiness video
on Azure Cosmos DB.
Hi Deborah, how are you doing today?
Hi Sanjay, great to be here.
Great, it's always fun to talk to you.
So what you going to do
First of all about your role at Microsoft.
So my name is Deborah
and I am a program manager on the
Cosmos DB team.
We are focused on the developer experience.
So this covers
the SDK portal and the Notebook experience.
I see.
So what are you going to cover today?
Today we are going to be covering
Partitioning.
Which is one of the core topics
that you need to know
in order to be successful
when you are new to Cosmos DB
for the first time.
Along with Data modeling
which Thomas presented.
So today we will cover
the basics and fundamentals of partitioning.
Why its so important to get good scale
and good cost in performance on Cosmos DB
and how choosing the right partitioning key
whether its for
Read-heavy or Write heavy workload
has a huge impact on really the success
of your workload.
Okay let's get started then.
Great.
So there some fundamentals.
But before do that,
I really want to talk about
How to Partition.
Why do we partition in the first place.
Yes.
There a one word answer which is
Scale.
So modern application modern workloads
your data cannot fit on one machine
and even if it could scaling that is a nightmare
because you have to keep buying
bigger and better machine
and keep upgrading it.
In contrast instead of this old scale-up model,
what Cosmos DB does is it scales-out.
So we do horizontal scaling.
Which means all your data,
we take it and put it among
a small number of physical machines.
So each machine is responsible
for a subset of the data.
So a kind of example here,
we have a bunch of people from
product teams, Cosmos DB, Postgres
and each of us is responsible
for a certain part of the product.
And that way like in partitioning
each machine is responsible
for a subset of the data.
I see.
Okay.
So in a retail example
you would have billions of rows,
then you would partition that
you know across different machines in a way.
Yeah.
Okay.
Yeah and one common question we get is,
you know, partitioning is important.
When does it become super critical?
Yeah
We find for our customers
who have like a smaller workload
like you know, 50GBs 50,000RUs of scale or less,
then still follow the best practices
but you won't really see the true impact
until you get to, you know, a terabyte of data.
Hundreds of thousands physical partitions
like some of our largest customers.
I see.
So then how do you think about the keys?
The partitioning keys?
I think that is the most important decision.
Right?
Yeah.
So when it comes to partitioning key,
before you can even choose it,
and I'll show some demos
of what goes wrong
when you chose a good one or bad one.
There are some fundamentals that you should know.
So the first definition.
I promise there are only two definitions.
So the first one is Logical partition.
Which means all the data of the same
partitioning key is in
the same logical partition.
So if we took everyone who presented today,
and partition as per the product group,
I am on Cosmos DB, Thomas is on Cosmos DB
Folks are on Postgres.
All the folks, myself and other people
on my team who are the Cosmos DB product group
as their partitioning key value
for product team are the same logical partition.
So all data can be grouped together.
Same with Postgres and any other product.
In contrast a physical partition
is an actuall SSD-backed piece of
compute and hardware and storage
that stores the data.
So we take all the logical partitions
and distribute them among a small number
of physical partitions.
From the users perspective,
you can define one partition to be per container.
All right.
So if I have one million logical partitions,
do I have that many, one million
for physical partitions as well?
That a really good common question
we get from people who are in need of this.
One of the best practices later you will see
is to have really high correlality
among values.
So having one million logical partitions
is actually good.
Then you are like,
Are you going to provision one million machines?
That's sounds really expensive.
So the answer is no.
What we do is we take all the logical partitions
whether you have ten or a million
and partition them on
a small number of physical machines.
The way we chose a number of physical machines
is hieristic based on how big your storage is.
So obviously if you need more storage
you've got to have more machines.
If you need a lot of Rus,
you need more scale and more machines.
But its definitely, you will typically have
a lot of logical partitions
on a smaller number of physical partitions.
I see, all right, let's keep going.
Cool.
All right so here just to make it really visual
for those who like to learn visually,
here is a diagram that kind of shows
behind the scenes what's happening in Cosmos DB
I told you all the data is distributed
among the physical partitions.
But how does Cosmos DB know
which machine to put your data on?
And that is based on the partitioning key.
So if you look at this diagram
the purple box here
think of it as the physical partition.
Right, actual storage,
The blue boxes are the logical partition.
So for logical partition key,
that's the ABC123.
You might have multiple documents
with that same value.
All data with the same logical partition
are also in the same physical partition,
Of course
So you have another one
say a different partitioning key.
It gets hashed and it figures out
which machine that it belongs to.
The great thing for our customers
is that as your data size increases
or as need more scale,
we'll automatically add more
partitioning for you.
So you don't have to worry about
you know, adding more data
or physically,
manually shutting the data yourself,
Wow fantastic!
Okay.
So we start the demo?
Yeah, so that's great.
So now lets get into the demo.
We have really kind of,
two kind of key workloads we see.
One is for read-heavy workload.
So let's take a look at
how you can choose a good partitioning key
for a read-heavy workload and what might go wrong
if we don't choose a good version key.
So here are some data
and its just product reviews
that uses cyber info for a retail website.
So here is a sample one.
Someone said that,
they reviewed a fleece jacket.
Right, and we have a user name
and in this workload
a common thing you would want to do,
is show all the reviews written by a user.
What's the scenario here?
So it's a retail scenario
where you buy things on a site
and users are also writing product reviews.
Okay,got it.
Okay so imagine you log on
to this website and you see,
okay here are all the reviews
that were written by this one person,
Yeah.
Which would be
a very simple C+ query.
So lets say,
let's actually go ahead
and write that query ourselves.
So lets say
select serve room c
where c.username is equal
lets pick this one.
Kurt28.
We will run the query and see we get
around 881 results and query stats,
we see this query consumed around 123 Rus.
Which aren't as many.
So if you wanted to run this query
a hundred times a second
that's almost 1200RUs per second which is a lot.
Right.
123RUs for one query
especially for the common one
it going to be kind of expensive.
So the first thing you want to do
as you can probably guess
is check you partitioning strategy.
By the way I should have mentioned
that this scenario is read-heavy
because if you think about a retail website,
people read a lot more product reviews
than they right product reviews.
Of course
Right.
So you want to optimize the cost of your queries
and key value look-ups.
So here we see ,we partition
let's take a look.
Inside settings.
Partition by ID.
But if you look at the query
we are filtering by user name.
Yes.
So what we are actually doing here is
remember that purple box diagram I showed you.
We are actually going through
every single physical partition
spreading on or two Rus to check
the index and see.
Do you have this data for username?
Do you have this data?
Do you have this data?
Which if you have a small kind of workload,
checking you know five partitions for data
is not a big deal.
But if you have you know,
hundreds and thousands of partitions
hundreds in terabytes of data
that overhead isn't going to add up.
So what we can do instead is
we can use a partitioning key
that we can be use as a filter
in most of our queries.
So most of our queries are going to filter
by username in this retail website.
So here I have the same collection,
same exact data.
Only difference is I have now partitioned
by username.
Okay.
So let's run the same query here.
It going to partition by username.
Right?
That's what you are doing?
Yeah.
All right and let's see how many
RUs this consumed.
And this one only consumed 40RUs
Instead of?
120
So this same query on V1 123 and then V2 forty.
That's a big difference.
That's a huge difference.
If I want to run this query 100 times a second,
this one would only need 4000RUs Versus 12000RUs
which is three times cheaper also in cost.
Like monetary cost.
So you really see that with the same data,
same workload ,choosing the right partition key
Makes a huge difference in the funds.
Absolutely.
Thank you for that great example.
And by the way this advise
for the read-heavy workload
also applies to anything read-heavy.
So user-profile, product reviews,
product catalog,
common things such as partition key
like product ID, user id etcetera,
So besides that, of course,
what else could go wrong if you choose a bad key?
Yeah.
So Rus are definitely lengthy.
Sometimes when people say,
'Oh my query seems to be going kind of slow."
Rus in that I kind of correlated.
If you are taking a lot of time to check
each partition that's adding more time.
So those are common symptoms.
You should check your partition key first
and make sure you are able to use that
as the filter.
I see.
And anything else in terms of
IOT scenarios?
Yeah, so that's actually the next one.
When we see a customers workload
typically they are read-heavy
kind of like that retail website
retail scenario or write-heavy.
Write heavy means you've got tons of data
coming to Cosmos GB at once.
Examples are IOT, telemetry
IOT may have many different devices pushing data
one time a second even multiple times a second.
So you are really pushing
lots of data in at once.
So imagine a scenario
like in a retail store.
You know customers are coming in.
There are IOT sensors in the store
realtime people they will be sensing you know,
that's probably what that could be.
Right?
Yeah it could be that,
it could be vehicles on a road,
it could be devices in a factory,
it could even be, as long as there's hard work
it could even be users on a website
tracking everything they click.
The point data is coming in so fast.
Yes so what I have done here,
switching over to my VM.
I just want to demo a scenario
and simulate a really write heavy workload
on Cosmos DB.
And show how to choose a good partition key
and a bad partition key.
So the scenario I have chosen today
is a vehicle telemetry like IOT data.
But really it could be any of those scenarios.
Of course So we have like a car
having lots of sensors so lots of information.
And you can see we've got some
information like event, name, description,
vehicle ID number, time stamp date and region.
Any of these could potentially
be a partition key.
So let's see what happens,
so let's just choose a few and measure
the performance and see what we get.
To simulate a write-heavy workload,
I have created two collections here,
one partition by date
and one partition by vehicle ID number and date.
And each one of this I have provisioned
50000RUs.
Which ,there a pretty small document
so we should be able to get
50000RUs worth of writes per second
which is around 10000ish writes per second
assuming 5,7 RUs per write.
So let's actually run this.
I have the bulk executor library
which is optimized for
sending lots of data at once.
So we are just going to simulate
having a lot of data in Cosmos GB.
Now one common thing you might want to do is,
okay let partition by date.
Right.
Because we'll query on it a lot
and let's see what happens.
Yes.
We are going to run it
and time how long it takes to do batches
of 10000 records and how many RUs
we will effectively able to get.
If you think about date,
what do you think might go wrong?
Because
In a day multiple times
you know there could be
interesting pattern across dates.
Yeah and especially because,
There's a heavy sale happening on one day
or one way for cars.
If you think about it even just a regular day
we have a lot of cars on the road
and they are all pushing date,
and today's date is all going to be the same
for every vehicle.
Yeah.
So all of them are going to be at the
same partition key value.
Which means they are all going to
hash the same machine.
So they are all hammering that one machine
and all the other machines you have
which can serve requests
are not getting any requests.
Which means you are not really
fully utilizing the Rus.
Since each physical partition
has a equal amount of the total Rus.
I see.
That is what we call the hot-partition problem.
As you can see here we finished but
we only got effective Rus of around
7000RUs per second even though
we provisioned 50000.
And the reason is we had a very low
correlality partitioning key.
To the point there is only one
distinct or unique value for date.
Which is, I'm generating today's date.
Okay.
So in contrast what you probably
want to do for write heavy workloads
is choose something with high correlality
or high number of distinct values.
Okay like VIN numbers.
Yeah exactly like VIN number.
In our case we are going to make it
even a bit more distinct.
We are going to have Vin_the current date.
Okay.
And the reason we do that
is just for IOT works that are extremely
write-heavy.
Cosmos GB is 10GB constraint on
logical partitions key values.
So if you have one vehicle ID number say 123,
you might actually fill up 10GB of data
for the 123 if you are writing
you know say ten documents a second
for an entire year.
So by making the synthetic key
coinciding with date we would get
more granularity to not hit that 10GB limit.
Okay.
So let's run this.
Let's see this in action.
Yeah , how fast it goes
and let's see!
And well that finished in less than five seconds.
And if I run this a few more times.
Let's run it again.
I'm going to get so closer to 5000RUs per second.
Yeah so we kind of like that.
Of course now we ran it very very fast.
It's the same data being ran.
All we did is change the partition key.
Wow!
That's awesome.
Thank you so much.
Any of the closing thoughts?
Yeah let me just put this here.
This is kind of the scenarios
and the best practices for write-heavy workloads.
Closing thought is really you know,
if you are starting out with Cosmos DB
you want to master data modeling, partitioning
and of course later RUs.
It's the very fundamental concepts to being
successful with Cosmos DB
Fantastic.
Thank you so much,
it's always fun talking to you.
All right, thanks a lot Sanjay.
Thanks for watching this short
Azure readiness video.
