>> Did you know you can import
data into Azure Cosmos DB up
to 10 times faster using
the new Bulk API feature?
Azure Cosmos DB already offers
service site transaction
support via stored procedures.
But now you can also
batch API calls from
the client without having to author
or manage a stored procedure.
Matias is here to
show us how Bulk and
Batch APIs work today
on Azure Friday.
Hey everyone. Donovan Brown here with
another episode of Azure Friday.
I'm here with Matias, and
we're going to talk about
the Bulk and Batch API
features of Cosmos DB.
So tell me why these are special.
Why do I care about these features?
>> Hi Donovan. Thank you
for the opportunity.
These are two new feature that we are
introducing in the Cosmos DB.NET SDK.
>> Okay.
>> We are working to add them
on these other languages.
>> Okay.
>> But the thing is, I wanted to
show you how these two works,
because they cover some
very special scenarios
our customers are needing,
and they are really new
so we need the feedback,
we need users to try them,
and share what they could
build and accomplish with it.
>> So are these preview? Are they GA?
>> No. They're really GA.
So they can pick it up
from the Cosmos DB NuGet.
>> Okay.
>> From any of the latest versions.
>> Okay. Got it.
>> So we'll start with the Batch API.
Batch API, like you mentioned
before, will store procedures,
acts as basically the finding from
your code a set of
operations that need
to basically commit or roll back as
a single unit in your
Cosmos DB database.
>> Got you. So when
you think a batch,
it doesn't mean that these are
all the same operation
over and over again.
This could be several operations
that define a transaction,
but I don't want to have to go
maintain stored
procedures to do that.
>> Exactly.
>> Got it.
>> So you can basically
version your code and
have all the changes in your code.
You don't need to do some mashing
to update the stored procedure
through deployments,
so it's much easier.
>> Because that's one
of the things because I
come from a DevOps background,
and we always talk about being
able to deploy a code quickly,
and one of the beautiful things about
Cosmos DB was the
schema was in the code.
>> Exactly.
>> So if you just deployed your code,
you deployed the latest schema.
>> Yeah.
>> Unless you started using
stored procedures and all of
a sudden you had to do that.
>> You have to update
the stored procedure
through the deployment then,
and if you had to
draw back, then it's.
>> It's really tricky. But
now, all that's taken care of.
>> Exactly.
>> Awesome.
>> For this, what I want to show
you is basically we are going to,
in a single batch operation,
create a show and three episodes.
In this obvious example,
I'm creating a show
called AzureFriday,
and three episodes for the
Cosmos DB SDK features.
Now let me press F5
and get these rolling.
The idea is that I will create
all four documents as
a single operation,
and then I will show
you how to basically
get the resource and understand if
those operations either
complete or fail.
>> Okay.
>> Now, this running. So here
I'm hitting the breakpoints
where I'm creating the batch.
I'm defining the show,
and like you mentioned, Cosmos
DB is a schema-less database.
So I'm creating basically
documents that have
different attributes
and different schema.
Now when this batch comes back,
I will able to,
from the response, basically
unwrap the individual
point operation results.
>> Okay.
>> So I have four
create item operations,
and I have four create item results,
and I have this API.
That this may basically
ask the batch response,
"Give me the particular results
for operation number 1,
2, 3, or 0," since it's
a zero-based array.
So in this case, I'm extracting
the result of the episode 1,
2, 3, and the result
of the show itself.
Now as a second operation,
what I want to do is, I
want to update the show.
Maybe change one of its attributes.
In this case, the last updated time.
But at the same time
in the same batch,
I'm going to try and
create another item.
Obviously this item
has the same name or
the same ID as one that
I previously created.
So the idea would be to show how
a batch that contains
multiple operations,
but one of them will fail.
How it actually behaves.
So the result is that
this batch fail,
and when I inspect the
particular operations,
I see that the first operation
that was the replace,
it actually could have succeeded,
but has a status code
of a fail dependency.
>> Okay.
>> Meaning that this failed
because something else
in the batch failed.
In this case, it's the other
operation that was the Create,
which basically failed
with a conflict.
>> Okay. So I have a question here.
Is like in the above and
even in this scenario,
you're showing me the
individual's status
of each individual theme.
But do I get a global or one it says,
"Yeah, everything was successful.
There's no reason to go
digging through the details"?
>> Exactly.
>> Okay.
>> It's part of the response.
So the response either
succeeds or not.
>> Got it.
>> So we have like a global
attribute that says,
"Everything is okay or not."
>> Because the only
time I ever want to dig
into each individual
one is if it failed,
I don't want to figure out
the details of why it failed.
>> Exactly.
>> But I don't necessarily want to do
this every time to make
sure it's succeeded.
>> Exactly.
>> Got it.
>> It's optional
depending on operations.
Let's say you did a read operation,
and you want to get the results.
So then you would probably go
and get that particular result.
>> Okay.
>> The next demo is
basically just doing
these multiple batch
operations where I'm
showing you how to
do multiple things.
I don't have to do all
Creates or all Replace.
I can make some much
any point operations
to actually achieve my
transactional results.
>> Got it.
>> That's pretty much
the idea with batch.
It's a way for users
to extract the logic from
the stored procedures,
put them in code, making sure that
the deployments don't
require any extra step,
and it's much easier.
>> Because these are type
of atomic operations
that are so important when
you're dealing with money.
>> Yeah.
>> You don't want money to be
taken somewhere and fail
to go somewhere else,
and now there's this money in
limbo, is really important.
That's why stored
procedures were used,
because we needed that atomic,
either it all happens or it doesn't.
Again, that was my only sad part
when I was working at Cosmos.
I was is like, "Great.
I don't ever have to."
My DevOps pipeline
got so much easier.
>> Yeah.
>> Then they're like,
"Oh, no Donovan.
You still have stored procedures,"
we're like, "Oh, my gosh,
we're so close, and now we're there."
>> You have an alternative database.
>> This is awesome.
>> The other new feature is
like I mentioned before, bulk.
Let me quickly show you.
So the idea of bulk is to tackle
a scenario where users are
needing to dump big amounts of
operations into a database.
Normally when you are using the SDK,
you would create these
single point operations
for one of each of your
files or documents,
and then you send the whole thing,
and each of these
operations will eventually
yield one network call
or one service call.
So if you have a million operations,
then those were a million calls.
Each call has network latency.
>> Sure.
>> So it's probably not as fast
as you would like it to be.
So the bulk feature that it's
basically just enabled by taking
your Cosmos client
utilization and adding
these bulk execution extra metal,
and basically you
pass a Boolean here,
say, "I wanted to re-enable or not."
What I'm going to show you is
basically this demo
is creating a list of
items and creating 300k operations
to dump into my Cosmos DB database,
and then I'm just calling
the CreateItem on the
container of the collection.
I'm not doing anything different,
I'm just doing the same CreateItem
operation that I will be doing
in a normal scenario where I'm
just inserting these single items.
The real magic is
happening underneath.
So this code is going to run.
I'm going to quickly run
it to show you these.
On the left, I have the same
code but with bulk turned off.
>> Okay.
>> In the right, I'm going to run
the same code but
with bulk turned on.
>> Got it. In turning it
on or turning it off,
is will you just be changing that
Boolean value of that one call?
>> Exactly.
>> But the way that I write
my code, I don't have to say,
"Oh I'm writing a bulk inserts,
so I have to write my
code differently."
I just go and say, "I need to insert
3,000 rows into this database.
Here's how I'm going
to do it." Then oh.
If I want to use bulk,
I just change that one value and
that magic in the code works.
>> There is no change whatsoever in
the way you used to use the SDK.
There is nothing you need to learn.
>> Awesome.
>> Just cover a flag that
defines the behavior.
>> Awesome.
>> While this two runs, I wanted to
basically explain what
is happening underneath,
because like I said,
there is no Bulk API change.
What we are doing is, I like to
think about it like a train station.
>> Okay.
>> So in a train station,
you have multiple tracks.
Here we will have multiple tracks
where each track has a train,
and each train is for a
particular partition.
>> Okay.
>> So what happens is, when you send
all the operations into the SDK,
the SDK will start to group
these operations by
partition affinity,
and put them as passengers
in different trains.
All these trains will
basically run in parallel.
So when a train gets
filled by operations,
it goes to the server,
and then another train can go to
the track and keep filling up.
>> Sure.
>> That is all happening in the back,
while you are just
doing your activities.
>> So my question is: How do I
determine how big these trains are,
or how many transaction is going
before the train leaves?
You can configure that?
>> So right now, and this is
why we need the feedback,
we have this setting,
but we are basically defining
a value that we believe
it's the right one.
But the idea is gather feedback,
make these probably an optional
parameter where the user can
define the size of the train.
>> Sure.
>> To think each as a non-compatible.
>> It's very clear that
it's the bulk sizes I have.
Because if I have an idea
of how much data I have,
then I might be able to give you
some help on determining what
should be the proper size.
I know I have 40 million records,
maybe we don't want to
send 1,900 at a time,
maybe we'll send a million at
a time or something like that.
>> Exactly, and then
basically what we do is,
we can send all these
trains in parallel.
>> Sure.
>> So there are multiple
requests in parallel,
but if you compare
the amount of frequency
with a normal operation,
they are always much less.
>> Sure. I guess you
were saying before,
there's always network latency.
There's even sometimes cost on
ingress and egress of network,
and has all being reduced for us
now by simply changing
a value to say true.
>> Exactly. The only
requirement is that
you execute this operation
in a conquering way.
So you create the tasks,
and then you can do something
like this basically.
>> I see. Got it.
>> So the way we basically group
this operations is by
their concurrency.
So as they start coming in,
we start to basically put them
in different and independent tracks,
and then send them away.
>> Got you.
>> Similar to what
will happen in batch.
When we get the response,
we basically wire
up all those responses to
their original colors.
So from the user point of view,
they didn't really even realize
that all these requests
were going in and the trains
were leaving the station.
>> That is awesome.
It's really cool to
see all this benefit from one value.
Because it's usually that one value
that breaks my code,
and you can't find it.
Now it's nice to have
one value that's
going to make my code much more.
>> Much easier.
>> Perform more than
it used to be exactly.
>> Yeah.
>> So in the last demo there,
I think you have them
side by side so we can
see the performance
difference, right?
>> Exactly. You can see
that the same code,
the same amount of operations
running in this notebook,
obviously that the
time is basically what
my notebook takes to
the Azure data center.
But the thing is the same code are
in the same [inaudible]
with the same resources,
it's taking 36 seconds
versus two minutes on.
>> Yeah. It's really cool, and again,
so we really need to be able to see
those changes with such a
simple changes developer.
>> Yeah.
>> Not having to change
the way that I think.
Because as soon as you started
talking about bulk and batch,
[inaudible] will change my code
and I don't have to do that,
and getting out of
stored procedures is
just the best news I've
heard the whole day.
So thank you so much.
We're learning all
about the new Bulk and
Batch APIs of Cosmos DB
here on Azure Friday.
[MUSIC].
