>> Hey friends, it's another episode of Azure Friday.
I'm here with Carrie MacDonald
and she's going to talk to me
about horizontal partitioning in Cosmos DB.
There may have been some things that I've forgotten to
do correctly and you're going to set me straight.
How are you?
>> Good. Thanks Scott.
So, as you probably
know the way that Cosmos speaks kind of laid out
is you create an account and within that account you have
several containers which we liked for to its collections,
and these collections elastically
scale to unlimited throughput and storage.
But one of the most important things
to configure when you
configure a collection is your partition key.
So, a lot of databases
are based around this idea of like partitioning
your data on the back-end side
and this partition key allows us a way
to route your data
appropriately to make your queries efficient and whatnot.
So picking a good one is super important.
Kind of what this looks like on our back-end
is we take the partition key,
the value that a particular document has assigned to it,
and we hash it such that it goes
to one of several physical partitions.
So it's not actually a one to one
mapping the way that we do it.
So, you might have
100 different partition key values but that
doesn't mean you have 100 physical partitions.
It's not like a one to one mapping.
And that makes it so that you
only have to pay for
what you're using it's stored twice necessarily.
>> Sometimes I feel like I'm getting
paralyzed though with I need to get this right.
But what if I picked first name and then
everyone's first name is John or Muhammad.
And now I've got a hotspot where one partition is
being beaten up because I picked
the wrong key, and that's scary.
>> Yeah. So like
I was saying one of the most important things
to do when you're
creating your collection is
to pick the good partition key.
So like you said,
one of the biggest things that can happen
is you can have a hot partition key
because you have to consider
both your request and your storage usage.
So, it's not just always
like I want to make
sure I have an even storage distribution.
You also want to have an even distribution
of when your data is getting hit.
Because if you provision like 10,000
RU/s in your collection but you have 10 partitions,
we actually separate those RU/s per partition.
So even if you think your workload
is only 10,000 RU/s worth,
if one partition is getting
hit with more than 1,000 RU/s of that
because you chose a hot partition key
then that can cause your-.
Yes it can cause you to be exceeding throughput.
And then another thing that is really important
with your partition key is that you want to pick
one that you can use in queries because if you include
your partition key as part of a filter like if you select
star from C where partition key equals blah.
Then that'll make it only have to get one partition
versus having to fan out and
hit all the partitions as well.
>> I see. So you don't want to pick a key
that ends you up with a hot partition.
But at the same time if you pick
a key that's one that you're often
using as a query value like user ID or email address,
that could be of benefit because it narrows your-.
>> Not necessarily the value,
you just want to make sure you always use that attribute
when you're like using your queries
like it doesn't have to be the same value because we send
to the various physical partitions
based off the value of the key.
So, some kind of tools that we
say you can use to choose a good partition key is to
really understand your workload and you know
what kind of request you're making so
that you know you can look at these queries.
You're not going to be able to ever
optimize it like 100 percent.
It's always kind of like we'd
like to say like 80/20 right?
Like it's going to be almost
impossible to get it 100 percent perfect but if you
can optimize 80 percent of
your workload to work well with your partition key,
then you're going to have a really good time.
>> That makes sense.
I feel though that I might.
You said I should really understand my workload.
>> Yeah.
>> But what if I'm by
myself and I've just made angry birds,
and I didn't know it was going to turn into a thing.
And then I wake up and now I have
a billion users and I had
made a mistake and picked the wrong partition value.
>> Yeah. So what a lot of people don't know is we have
some really extensive metrics in the Azure portal for
debugging issues like you picking a bad partition key.
So I have a couple of demos to
show that off a little bit.
>> Excellent.
>> Okay, so the first
one this workload that I have set up here.
The idea is that we're logging every time
we see an event happen in a particular location.
So we're logging like a timestamp
related to the event, we're logging the location.
So in this case for the partition key of this one,
I'm using the city of the location as
the partition key and you can see like
here in the portal on our map
like we're being rate limited,
which means that we're exceeding
capacity of what we provision
throughput wise on our account.
So that's a problem.
You don't ever want to see that circle be orange.
>> I'm looking at your average throughput,
it's 132 is that because maybe you have
100 at that location. Is that possible?
>>Well, so right now there's only one
location so it hits all of them.
>> You're using more RU/s than you paid for?
>> Yes, exactly.
So we can look at this further so we can look at
our storage distribution and you
can see this is a chart where we have
a per partition key distribution so you
can see like it's very obvious that we have
a hot partition key and that partition key is Seattle.
So, it seems like most of
these sightings that we're seeing are
all happening right here in Seattle.
>> Oh, okay so I've made
a product and I've decided to make
a city my partition key except
the only city I'm targeting is Seattle.
>> Yeah. So obviously
that's not going to work out very well for you.
>> Got you.
>> So you can see like you know you have
a hot partition key it's the only part storage
wise it has more than all the other ones and
then the physical partition also works out that way.
We can click on this physical partition
we can see like you know there's
two partition keys stored here but
Seattle is like obviously massive storage wise.
And then you can see this in "Throughput" as well.
>> Some great tools,
it does make it very clear, doesn't it?
>> Yes so you can see that were
being throttled here and then you can also see,
one of the metrics that we like to say
a lot for throughput is
Max consumed RU/s per partition key range.
So basically that is
the maximum of RU/s across all physical partitions.
So out of all your partitions the one that
uses the max is what
we're showing in this particular chart.
You can see like you know because we
know we have one hot, it's pretty bad.
So we can actually click on one of
these spots on this chart,
and then this chart will actually
sync to the time that I clicked on on the other chart.
So a lot of times if you see like a spike you can
click on the spike and then we can look at
the per partition distribution and we can say oh
hey this one's over the amount that we provisioned,
and then we can say look there's Seattle.
And then we can go even
farther and we can select Seattle and we
can immediately open a query and data explore.
And it's already prefilled
to be on Seattle and we can immediately start debugging,
looking at our documents that have that partition key.
>> I can see the line that says here's
how much I provision, what I paid for.
And here I can see I'm regularly spiking over it.
I can't just throw money at this problem though, can I?
>> You could but
that won't be like the most efficient use
right because your overall workload isn't-.
You know you're paying for the right amount but
it's just that one particular partition
is getting hit so much more than
the others that it makes it
look like you need to pay for more.
>> I see.
>> But in order to
fully utilize everything that you're paying for,
it would be better to distribute that workload better.
>> Why throw money at a problem when you can just
simply fix the software and keep paying the same?
>> And then I have one more demo so
another use case that we commonly see
is when people might pick
a timestamp for their partition key, right?
You're like okay look we get a really
good distribution of storage with
the timestamp right because we're constantly
adding different ones.
Over time, you now see the physical partition is
relatively distributed but then we go to "Throughput".
And you'll see we're still being throttled,
that's kind of strange and if I can get it to scroll.
>> Maybe try clicking
in this white area and use the arrow key.
>> There we go.
So now we can see that
there's only one partition that's getting hit, right?
Because if we're writing at the same time,
the only time that we're writing is right now.
So we're only going to be hitting one partition because
that's the time that we're
writing out if that's the partition key value.
>> So how do I fix this? Do I
kind of re-partition everything?
Can I pick another key, is it too late?
>> So you do need to
recreate your collection in
order to pick a partition key,
which is why it's really important to
get this right the first time into
like you know test different options and whatnot.
But you know we're always willing
to help people with this problem
and take questions and help
people optimize their workload to get
the best effect of this product that they possibly can.
>> So that's useful. So worst case scenario,
I can make another collection,
copy my data over but during my testing phase while
I'm getting started while I'm developing
the product, I could try it out.
I could try out different collections and test it.
>> Yep and you can debug right here in the Azure portal.
>> Where can people go to get help?
>> AskCosmosDB@microsoft.com.
>> AskCosmosDB@microsoft.com that's very convenient.
Fantastic. I am learning all about how to scale
horizontally with Cosmos DB here on Azure Friday.
