>> You guys have made it through the maze to get over here.
Congratulations. I promise this session will be
Worthwhile. Today, what we'll do is talk a
Little bit about azure. I will give you a quick dlr, in
Case you're new to the service, what is heck is this thing.
Then we'll talk about the resource model, request units,
Automatic indexing, and then we'll talk about the cool new things we have available for
Build. First, what is this thing?
In a nutshell , cosmos does three
Things well. The partitioning allows us to
Give you the elastic scaleout so you can have something that's
Very small and relates seamlessly to something large.
We can deliver turnkey revolution to distribute that to
Regions along with a model where you can manage performance with
Respect to consistency. The resource governance allows
Us availability.
Five-nines of availability through multi-region.
However, aavailability is just one aspect of a database.
What you also care about is the performance. Is it possible to achieve
Predictable performance. What we've done is implemented
Resource governance at a fine granularity.
What that allow us to do is give you a novel unique capabilities,
Like guarantee the threwput as well as the latency for every
Individual request. On top of this core platform,
We've built this as a data store, and we're able to
Officially project four
Different non-relational data models.
These can then be exposed through a variety of different
Apis, ranging from tables api, cassandra. We have our own dialect of sql
And gremlin. Gremlin is kind of the odd ball
Out of here , but it's really
Cool. It can do graph reversals.
Now that you have an idea of what is this thing, how do you take advantage of it.
What are the core things you need to know.
Understanding the resource model, the first thing you will
Be presented is an account. The account is just an uri and end point.
Setting this up is extremely simple. What we do is we go to the
Portable azure.Com. Go to cosmos db, name it.
Choose an api. Choose an address subscription
And resource group, starting location. Click create.
You're up and running. Five minutes, you're up and
Running. You're not spending a week.
A database has a set of containers store your data.
Depending on the different data models, we'll project them differently.
If you're coming in through a document-oriented data model,
We'll call this a collection. If you're using our gremlin api,
This is a graph. If you're using the graph, this
Is a table. Underneath the collection, here
Is where things get interesting. This is the first time you will
See physical resources likely. We give you unlimited storage
Through the elastic scaleout. We also ask you to provision
Threw put. You can do that on demand.
We can deploy that across a number of partition sets
Underneath. The magic that makes this
Happen, you're going to define a key. That's going to be a logical
Hint for how you want to distribute these.
We'll talk more about that in just a second.
Once you provision the container, you can store your
Data, of course. We call these items
To be data model agnostic, but these would
Be your documents, your rows, verity sees and edges.
You can implement store procedures and triggers.
This is going to be your mechanism for transactions.
This turns out distributed transactions are very difficult.
You end up having to do a two-face commit.
There are some trade-offs with respect to how you manage
Conflicts and con currency. A lot of databases will steer
You toward doing transaction over a single record at a time.
Cosmos db brings them back on the table through the use of
Store procedures and triggers with the caveat that you must
Scope this to a partition key. Wes have a a user defined
Function model for query language, and the last thing to
Know is conflicts. Conflicts are both interesting
From a perspective of we guarantee the durability of your
Data. We don't play any tricks.
Every time you make a write, we
Have durableably committed that to a core of replicas.
What is this conflict feed? well, for one, we support
Multi-region deployments. When you do automatic fill-over,
There's going to be a trailing window of about 10 to about a
Few hundred milliseconds, depending on the distance of the
Two regions. That is because we also, like
You, follow the laws of physics. Turns out you can't send packets
Faster than the speed of light, so we actually have metrics
Where we'll expose problemly
Istic graph. Should we ever have to an
Automotive fillover due to a natural
Disaster, we can expose these for the conflicts feed.
We can also explode conflicts for the master part, which i
Will talk about later in this session.
That's kind of the user-facing resource model.
What does this look like behind the scenes.
This is a globally distributed database.
We like to start with planet earth. We have 40 azure regions today.
It keeps growing day by day. I've lost track, personally.
In these regions, we build data centers. We deploy these big racks we
Call stamps and divide them into fault domains.
If we deploy a cluster of
Several hundred machines, we can
Deploy a replica set across multiple fault domains that
Gives you something highly resilient, even to hardware failures.
This is where the magic happens. Within tho machines, we host
Database replicas. We have a transport layer for
Doing replication, admission control for authentication and authorization.
And we have the engine where we do all of our querying and indexing.
Within each of these containers, we have these partition sets
That are composed of database replicas. What we do is host four replicas per region.
The reason we have four, we can offer you durability where we
The deploy database updates, hardware updates, completely
Transparent to you. The
Mechanism that this works, we'll rely on a core mechanism.
We can look for three out of four or whenever we want to pull
Out a replica, apply an ios update, we can size that quorum
To two out of three. By always having at least three,
When we pluck one out of these replica set, what that means is
These database updates or software updates are completely
Transparent to you, and you can still lose a replica because we
Have a quorum at any given time while not affecting
Availability. This database is always online.
You don't have to worry about hardware updates, software
Updates we're always available. Always online.
So now that i have kind of an looks like. Let's look at the important concepts, which is request
Units. At first, i think a lot of
People, the first time they dive in, it's a little bit of a
Confusing subject, but it's something awesome.
The request unit, what it is, what we've done is normalized
All the physical resources it
Takes to process a request across a single dimension.
When you're scaling your application, how many of you
Guys have done capacity planning for your database before?
How many of you guys actually find this is a fun exercise?
Whoa. There's two of you guys. Awesome.
The thing about capacity planning, it's really tough,
Especially when you're deploying on premise.
Of you cpu, io. Your boss comes to you and says,
I need you to scale from 1,000 requests per second to 2,000
Because that's what our forecast tells us.
And you go, do i add more memory?
More cpu? how many is that? i don't know.
You do all this benchmarking and stress testing.
Figure out what the bottleneck is, play whack-a-mole.
Relieve the bottleneck and then suddenly a new bottleneck shows up.
We can do better than that. We do a logical charge on all of
The -- we use machine learning to produce a logical charge in
Terms of every request for units. We make sure that's stable.
Let's say a request takes five request units today, our promise
To you is that's predictable.
If you need to do 1,000 operations per second, and you
Know that these operations cost five each.
That's 5,000 request units per second. Should you need to raise to
2,000 Operations per second, five times 2,000 is 10,000.
Your capacity of planning is literally meantal math at this
Point. We've made that really easy.
You have to understand what a
Request unit is. Every request is a little bit different.
When you do a read request, while we can guarantee the same
Follow-up read request is a stable predictable charge, it
Turns out a read, we can go to a single replica, pull out a value
And return depending on the consistency level.
We're going to have to rep indicate to others.
We normalize this into request units. You can remember this
By how many an io takes.
A token bag model, you're provisioning a token bag full of request units.
Every time you perform a request, we take tokens out of
That bag. Every second we go and replenish
That bag. If you exhaust the bag, what
We'll do is instruct the client, hey, hold on.
We try the request a second later. If you have spikes in your
Workload, you can use it as right limiting to amortize those
Over time and increase and decrease these request units on
Demand. And that's what is really
Powerful about this being a cloud system.
You only pay for what you need. You can increase and decrease
How much you are provisioned at any time.
We do this hourly. If it's hot from 9:00 to 5:00,
Increase and then decrease the hours after that.
We make sure and use leftover rus.
You can set data retention policies out of the box, as well
As doing index transformations and other background tasks we
Can take advantage of those leftover rus.
One of the new things, a build we're releasing, you can do
Requests at a container. This is great when you co-locate
Across a single container. However, sometimes what you're
Doing is taking advantage of our monogo dpi.
What we find is it's treated as a logical name space and
Provisions a lot of these. What we allow you to do is set
The request units at the database or key space.
That way they can share that rus per second. They will be co-located on the
Exact same cluster. How do we scale out in cosmos db?
One of the small things is you can start small, go large.
Or you can start day one and put in an extremely
Large. Each instance is tied to a
Cluster of machines that we can dynamically
Size you're going to go in and give us how you're
Going to want to distribute. What we do is consistent
Hashing. If you have incremental user id,
You're not front loading them all to a single partition.
Once we know how this distribution, we map them to
Underlying storage nodes based off your actual storage and
Threw-put needs. If you have a billion values,
You don't have to spend a billion partition sets because
That would be really expensive. We size that down and many of
These larger partition keys on a fewer set of partitions.
The nice thing about consistent hashing is what happens when you
Want to dynamically grow or shrink this cluster?
Let's say this partition on the end here, the fourth one here,
Is growing and it's starting to fill up.
Well, what we can do is monitor this ahead of time and do what's
Called a split. For that range of values, we can
Give them double the storage and double the
Through-put and split them up.
The really cool thing about this is we're not giving you tooling
To give you elastic skill. We're managing this for you.
We'll proactively grow the number of partitions.
From an application perspective, you now have a database that
Grows as your application grows. Now, we have done 95 of the
Work in terms of charting, which is all of the operational costs
Of how do you route requests, how do you route the data, as
Well as how do you grow and shrink this cluster.
We haven't done anything yet. That's the remaining 5.
That's just a design time decision. For the remaining 5, this is
Your exercise of just choosing a good partition key.
What is a good partition key? well, at a high level, you have
Two different design. You want to distribute the
Overall design and storage volume to avoid hot keys.
Say you have an iot.
Some people may go to a current time.
That gives you data across many different keys.
I want to point out you also want to distribute your overall
Request volume. At any given time, all of those
Requests coming from all those different devices will actually
Have the same current time value.
You want to distribute the through-put as well. A common technique for iot
Telemetry is choose the device id for the partition key.
That will give you an even distribution over storage as
Well as over the request volume as well as as it turns out most
Of the queries will be on a particular device as well.
>> The second design, you want to choose a partition key that
Allows you to scope. Ours are intelligent.
They ship with part of the sdk. It will pull out the partition
Key value and say, hey, you're running a query.
I know that abc hashes to, let's say one, two, three.
One, two, three, is on partition five. Let me go ask partition five
Right away for that data. What happens is called a fan-out.
We just ask all of the partitions in parallel.
Because this can be heavily parallelized, the latency is minimal.
Over time, this is not the most efficient thing you want to do.
You want to choose a partition key that ideally allows you to
Scope the query for the bulk of your queries.
You're going to ballpark your scale needs. You're going to get an idea of
Do i have tons of gigabytes, do i have hundreds of
Tera --
The end of the day, it's very easy to get into analysis paralysis.
What if i want to do this or that? you spend the next years
Thinking about what ifs? these are cases that happen
Every now again. It's totally okay to fan out for
Those for a read-heavy workload.
You want to look at the top three or top five queries.
Choose commonalities. The main thing to think about in
Addition to distributing the overall request volume
Is understanding transactional needs.
Some general tips here. A lot of people do fall into
This analysis paralysis trap. Try to avoid.
That build a proof of concept. Once you built that, it will
Become crystal clear exactly what might be a good partition
Key for your application. The other thing is don't be
Afraid of having too many partition keys.
I get so many countless questions going, hey, i'm
Building this massive social media platform. I might have to support billions
Of users. Is that too many keys?
The answer is no. The higher
The cardinality, the better. If you have billions of
Partition key values, we can start small, co-locate them and
Grow that elastically into a billion different physical
Partitions. So with that said, i would like
To invite my colleague here. He's going to tell you about
Some new awesome things we're working on.
We have a bulk executer library to make applications even easier.
>> Thanks for that, andrew. >> Hope you guys have been
Having a good build so far. As andrew was saying,
Azure cosmos db is a flexible database
Service that provides -- really, it gives you the opportunity to
Scale up to really large read and write
Workloads, and that is essentially what we call elastic
Scale out. That allows you to store better data.
All of this is very, very predictable. This gives you a lot.
The service itself gives you a lot of potential.
What you want to basically exploit on the client's side.
That's where ga'ing comes in.
The bulk executer library is an extension library to our core as
Dk. It allows bulk operations on db.
We support bulk import, bulk update, bulk delete.
I will go into further details about that.
When we're g'ing it, we're doing it in
Java. It's not that you could perform it earlier.
What you could do is it used to write multi-threaded
Applications which would interact with cosmos db.
Each of these threads in such a multi-threaded application would
Use to perform point operations. If you're doing bulk input, you
Would use point requests to basically write documents.
Now, there's a lot of overhead when you're using the core for
Such a purpose. Now, as you know, we have
-- What if you're trying to
Move a terabyte of data?
When you're doing this, you also have issues of how do i handle
Rate limiting. As andrew was mentioning, we
Have a concept of request units. Say at a specific time, you're
Lowered on the services too much. The service will give you a
Status code saying you have exceeded the
Current collection. You would need to write custom
Logic to handle such a rate-limiting feedback as well
As requests time-outs on any other interaction with the cloud
Service. Now, all of that is currently
Inherently handled under the inside of the bulk executer library.
It's simple to leverage this and perform bulk operations in
Cosmos db. We've been in preview for a few
Months now with the bulk executer library.
We have a lot of customers who are already in production who are basically using the library
To both import and update
Terabytes of data. They do this in production.
Let me just put that back into action.
Let's do this. I have a nice demo here, which
Compares and shows you the difference between performing
Such a bulk operation using the code as sdk versus the bulk
Executer library. So this is an official page that
Has a lot of samples, if you have not already checked it out.
Once that sample is a performance benchmark, which
Essentially is a sample which spawns multiple threads to
Basically migrate data into cosmos db.
Essentially, we'll be benchmarking against this sample
Here, and the other is with today, we have documentation and
Samples out for the bulk library.
This is publicly available on github.
We'll be creating the documents in samples and ingesting using
The two different processes. We'll be spawning different
Projects and the other is a single thread with the bulk api.
Let's compare the two here. Okay.
Now, i've created this with 50,000 rps.
This is not a large collection. This is a medium-sized
Collection. Let's do one thing.
>> Can everyone see the text here?
We're zooming in a little. This is the sample which writes
Data using the sdk. Now, i will just show you what
I'm doing here. I'm essentially going to be
Creating 50,000 documents and then ingesting that to the
Collection which i just showed you.
Let's see what happens here. So here what it does is, as you
Can see, it spawns 2,000 threads on this azure vm here.
It's a 16-cord vm. This is a very nice example of what happens here.
Right? as you can see here, there's --
There are exceptions which are getting returned to you.
You will have to write custom logic to catch these exceptions
And then handle them. As you can see here, they were
Able to write the 50,000 documents along with -- and then
Some of the documents also through exceptions here, but it
Took 37 seconds to complete this. Let's go to another example.
Let's go one magnitude higher. You had a collection with 50,000
Audio rps. Let's say the load on your
Service is going to increase 10x. So i would go to the portal and
Essentially scale that up by another 10x.
Ct-so i would go here and basically
-- Right now, i have 50,000 rps. I scale it up to
500 Rps. As you can see here, we've
Scaled that collection up to 500,000 audio rps. Right?
Now let's go back to the conflict here.
Now i'm trying to say import 500,000 documents into that
Collection here so now, if you
Can in the same scenario, although i have created it, i
Have these exceptions to showcase this in the
Demo. But you can see here,
Essentially you're consuming only 36,000 rps out of the
500,000 You've created here. So what you would rather want to
Do is -- let me show you the other
Options. This is the other sample that
Includes the bulk library. This points to the same
Collection which we had there. Here again it's trying to import
500,000 Documents. The same 500,000 documents.
Let's see.
As you can see here on the client side, it's consuming
The this is basically becoming a
Live de-bugging session. We have to see what's happening
Here. While that happens, while the
Import completes, let me just go back to the presentation here.
So what does the bulk executer library provide?
It heavily reduces the client-side. You have to saturate your collections.
So compareded to the point-right.
It strikes away having to write
Other transient you get from db. One can consume more than
500,000 Audio rps. All you would have to do is now
Spin up two of three of these and you would be able to
Saturate the collection versus in previous audios, you would
Have to spend 20 to 30. So far we've been talking a lot
About the bulk import. We provide the ability to update
Existing data in the collection using patches.
You can read more about the documentation and samples we
Have online. We also have something in
Preview, which is the ability to bulk delete data in a collection
By specifying a sql query.
All of this is available and we have support for our mongo api,
As well as a graph api in preview.
In the coming months, we'll make that available.
We have native integration into both our connector -- most of
You must be aware of azure data factory, which is a nice etl --
Nice mechanism to perform etl jobs. Now, this library has been
Onboarded on to the adf connector.
It brings all of these out of
The box for all your workloads now.
So really, happy for you guys to go out and start testing this
Out. Thanks. [Applause]
So check out the new bulk executer library.
It will make bulk import, bulk update easier.
We have a nice library where you don't have to spin up all these
Different threads and do point insertions. They become nice, easy, single
Method calls. Let's now jump over to
Partitioning is about taking a data set and dividing it up,
Different data across different machines. It's about copying the same data
Across multiple machines as well. So why is this interesting?
Number one, it's our means of providing high ability.
Cosmos db provides automatic and fillover across different
Geographic regions and manages local regions on behalf of the developer.
We expose all of these different replicas through a multiple
Homing api. You don't have to deal with
Managing addresses or uris. This is all under a single
Global uri. Should you ever need a trigger,
It will route one region from another region seamlessly and
Automatically. You don't have to do any app
Deployment or app start. The other thing that makes this
Interesting is providing lola nc. Packets can't move faster
Thatten the -- faster than the speed of light?
We ship the data closer to where our users are.
We're bringing cdn capabilities, expect for doing this on a
Static content, we're providing cdn for ow database workload.
What this looks like behind the scenes is you have a container,
Which is then further divided into partition sets.
These partition sets then hold replica sets, and there's going
To be four replicas per region. Up top here, i have a container.
I have a partition key. Airport. I've divided them across different airports.
And what you see here is where we're setting up replicas
Dynamically to different geographic regions.
Setting this up is incredibly easy. Here i have a screen shot of our
Azure portal. You're not spending the next few
Weeks installing another set of vms across different geographic
Regions. Instead, it's as simple as what
We do is call this turnkey, meaning this is as easy as
Turning the key in the ignition. I click on what buttons of
Regions i want across different azure  regions.
Then i click save, and i'm done. I also want to point out for
Folks that are going, maybe this is a little too easy.
We respect geo fencing laws. If you're working in a sovereign
Cloud, a department of defense, china, germany.
We respect geofencing rules.
For the regions that are outside the sovereign cloud for which we
Have about 30 or so regions, you can set up these massive
Deployments disperseded
-- Dispersed through these continents.
You get 180 milliseconds to go from west u.S. To australia.
By replicating data, i can do it no matter where the application exists.
So for automatic fillover, turning this on is as simple as
A toggle switch. You have a priority list of
Regions to fillover through. This is a simple drag and drop interface.
Then from here, we also allow you to configure this at a
Different client level. Let's say one application is in
The u.S., Another is in southeast asia.
User mixer request. It goes to traffic manager.
Traffic manager routes that request to the closer azure
Region, and it goes into the application. That particular application
Instance will set a preferred location based off an
Application con fig where it's aware of what region it's
Deployed in. It will config the
First, second, third priority, based on
What's closer to it. That way at the client level, we
Can look at the fillover behavior.
In addition to automatic fillover, we could do manual.
Should you want to do disaster recovery drills.
We back this with models. Rather than giving you kind of a
Black and white view of the world where you have hard extremes.
On one end, you have strong consistency, meaning you have an
Extremely high readability. However, what you're doing is
You're actually paying a huge latency cost.
You're propagating this around the world every time you do it.
Versus consistency where you're going to get lightning fast performance.
You will read to the replica closest to you.
You don't have a lot of predictability. We allow you click stops in
Between consistency. It's all about setting tight
Bounds on how eventual is eventual.
Let's say you have a mission critical application with strict rpo objectives.
What you can do is control how eventual is event
For session consistency, we can constraint the consistency to a
Single session, so you're able to read your own writes within
That session. Read will follow rights.
You will have monotonic reads and rights. Let's say you have a social
Media application. Let's say you're building the
Next facebook. What you end up having to do is
Navigate these hard extremes. If everything is a strong consistency.
I don't want slow load times because i have a billion other
Users liking comments.
I honestly could see their like in a news feed, but when i post
A comment, i should observe my own comment so i don't have user
Experience where if i refresh a page, my comment disappears,
Where did that go? let's say i'm trying to grab
Beers with my friend, ram, i'm like, hey, let's go have beers.
I refresh and it disappears. I'm like, maybe it didn't go
Through. I do it again.
Another stale read. Suddenly all my friends think
I'm an alcoholic because i'm screaming at ram going, let's go
Grab beer. Let's go grab beer.
I can save myself from that embarrassment because when i
Write the comment and refresh the page, i'm
Guaranteed i can read my own writes.
Rather than replicating operation by operation, we're
Writing this to a log and replicating an order.
All of the operations, fie write operations one, two, three,
Four, five, if i see operation five, i'm guaranteed to see four
Through one. If i write three, i'm guaranteed
To see one through three. At the end of the day, packets
Do behave mysteriously -- well, not mysteriously, but they can
Arrive out of order based on network conditions.
We saw this by replicating everything off a sequence number
Off our log to guarantee everything is in order without
Gaps in history. Let's say we're in a group chat.
Not just am i trying to grab a beer with ram, but let's grab
Everyone in this room and grab beers. If i post, hey, what time do you
Want to go? how about 8:00. No.
That doesn't work for me. Some of you guys may see 7-no.
8:00. Some will say.
8:00, No. 7:00. Half will end up showing up at 7:00.
Hey, where are my friends? the other half will go, hey, why
Are you guys already drunk already? what happened here?
Did you start the party without me? so it allows all of these
Operations to be in order. So we give you nice click stops.
They're available in the azure portal.
We make it turnkey to implement. We can guarantee correctness for
Your application with respect to consistency as well as give you
Ideal performance with respect to latency and through-puts at
The click of a button and make this really, really easy.
One of the new things we're doing with build is launching a
Master. With
Multi-master, we're waking writes ubiquitous to all
Regions. On top of this, you will get
Lower-latency writes because you're writing to the region
Closest to you, as well as you will get better through-put
Because you have additional writes all around the world.
It's great that we've implemented this, but also the
Important things are we want to make sure we guarantee
Correctness for our districted system. All of our consistency models
Still apply just the same with multi-master and then one of the
New things we've implemented is conflict management. Earlier i showed you this
Conflict resource in our resource model.
What we're doing here is, let's say i write to a record id5.
Simultaneously between two different writers across two
Different points in the world. Well, now you have two different
Sources of truth. Which one wins?
Well, what we're doing is giving you a comprehensive conflict
Management system where you can actually configure.
Do you want last write or first write? if we detect a conflict, you
Register a body of business logic. We'll run it to do conflict
Resolution for you or take an approach where we understand
Each of the properties, do diff'ing.
So that's multi-master. Now that i have a system that is
Highly partitioned, highly rep indicated, i have many machines
Distributed all around the world. One of the next things is how do
I deal with schemea maintaining it.
Can we make that easy. This is the icing on the cake.
Schema diagnostic indexing.
I will show you how this works, but what is it?
It's a proactive approach as opposed to a reactive approach
Where we'll automatically index every property of every record
Without you having to define hints or schemas up front.
What this allows you to do is if you have hetero data, many
Different generations of devices, each devices have
Different capabilities, different sensors, and different
Props being written or perhaps after an application where i want to move fast.
I don't want to be bogged down by doing alter tables and a
Bunch of migration scripts over and over.
How do i actually support this? well, what cosmos db does is
Implements inverted index. Inverted index has been popular
As a concept and information retrieval for a long time.
We take a data structure called the bw tree
Here i have a record on the left.
Let's say it's a json record. I have a different set of
Properties. The root node will be record id
Followed by child nodes, properties of the record.
I can have nested objects, nested arrays. At the leaf nodes, what we're
Doing is preserving all of the instance values of that record.
Now, what's cool about this is the record on the left, if you
Notice, it's self-describing. It's telling you what properties
It has, what are those values, and what are the types of values.
So if i mutate this record and let's say i have a second record come in with a different
Property where i add a revenue property as well as a dealer
Property, what we can do is we can actually build a tree
Representation of this. With the inverted index, what it
Looks like is adding all of these trees together.
If i merged these two trees together, i have an inverted index.
If i run a query that says select star whose first location is germany.
I can follow along the left-hand path of this tree to reverse all
The way down to germany. And that has a pointer to record
Ids one and two. What will be presented in the
Query result is 1 and 2. If i run another one on the
Right-hand side path and select star from this collection whose
Dealers first name is hanz. That's the only record that had
Not only that value but that whole path of values.
So what we're doing is as you ingest records of different
Schemas, of hetero schemas, we're couple offing this with
Our tree that is a latch-free data structure to allow you to
Do this with very, very heavy write through-put and
Synchronously. We also give you the ability to
Tune this. This is a gig database.
We expose a lot of knobs and levers. On the right-hand side i have
The index policy. Every collection will come with
One of these and what we've included is slash star, meaning
Index everything. Do that recursively.
I can set a set of index paths or included paths, as well as excluded paths.
In the traditional database, we're white listing columns a,
B, and c. You can include at columns a, b,
C or you can do something very novel.
Instead of white listing, old school approach, you can do a
New way of doing this, which is blacklisting. Include slash star and exclude
The paths that you don't want indexed that, you know you will
Never query off of. You confirm that over time as
You optimize your database. What that allows you to do is as
You evolve your application model, 90 of the time when you
Change the schema, you're adding the property you want to query
Off of. Why not make that list.
That's what automatic indexing does for you.
Let's talk about the new goodies we have in build and we'll open
This up for q and a. For build this year, as a quick
Summary, we have -- we've launched provisioning input
Through a set of containers. This
Allows clusters on the same
Set of machines such that can share that through-put together.
On top of this, we introduced a bulk executer library very, very
Easy. Simple, single methods.
We have launched this for dot net and
Java. Rather than individual as the
Body -- tasks for each
Operation, you can get client-side improvements.
On top of this, we have multi-master. That's in a preview sign-up.
It allows you to do writes across any number of regions,
Giving you better write about as well as better write through-put
And write latency.
Then as a cloud data store, what we want to do is also restrict
Public access from the public internet. Just because you have a database
In the 't
Men you want anyone to send requests to it.
We gated all of this through an ip firewall.
You can white list a set of ip addresses and that works when
You're running on, let's say, vms where you control the ip addresses.
The big missing link was how do i do this for path services.
Setting up a network. All of the services have been
Working and cosmos db is launching v net support.
This is actually compatible with our ip fire wall.
You can join this and only if the machine that's connecting to
It meets one of these criteria will we allow that request to go
In. Otherwise, we'll say 404, i
Don't know what you're talking about.
We don't exist. Not found. And then, of course, the last
Thing we're releasing at build, we have a new
Asynchronous operations. We've taken the time to actually
Refactor this to work with the java library to build an
Asynchronous version. We move from a synchronous
Model. This is naturally an io bound
Activity, but nonetheless, you get a 2x performance here.
We used this as an opportunity to improve the user experience
Such as the interfaces, we've
Made a much cleaner set. We've done a high-level overview
Of what is this thing, understanding the resource
Model, and some core concepts. What exactly is a request unit.
How do i partition my workload. How do i rep indicate my
Workload, and how do i configure indexing.
Uncovered the goodies. I would like to open this up for q and a.
We have a microphone in the center soft room. I encourage you to use it.
If you have a hard time, given the seating arrangement, i can
Also repeat the question if you raise your hand.
>> Hi. >> He low.
>> With all that said, i haven't seen anything regarding the
Storage model. Are you working directly with
Storage account? is it managed disks?
All the underneath of this storage to host this amount of
Data. >> So cosmos db is fully built
On ssds. We build on barren metal.
That's so we actually pin our compute with a very specific drive.
What we want to avoid is having a femoral disk.
You have additional overhead if you have a request to perform an
Io operation. What we do for cosmos db is
Build on barren metal. We implement all the resource
Governance on the database side so we can take advantage.
We own the stack end to end. We're an ssd-based system.
>> Thank you. >> Hello, thank you very much
For the very informative session. So i have a couple of questions.
Because of the size and people coming from a traditional
Background of database, is there a backup and restore needed in
This environment? that's question number one.
>> Okay. So first thing is backup restore.
I will kind of tell you about our current state as well as
Where we're headed. In our current state, we
Automatically snapshot your database.
These are logical snapshot. We store these in a geo
Redundant ask the so we can restore them to any number of regions.
With that said, what we're doing in our road map, we're going to
Improve this model so we're going to try to remove -- well,
We're definitely going to remove the
Azure process. We're using log store directly.
That will open up the doors for a timely restore.
A much, much more narrow recovery point objectives.
>> So is that on the portal from a user-interface?
People will have that capability for restore going forward?
>> Moving forward, yes. For the existing system, it is
Exposed to the azure support system for the existing system.
And for these snapshots, we automatically turn these on and
Off your behalf. We don't let you turn these off.
We use these backups for our purposes in case a region gets
Wiped out by an earthquake or a natural disaster.
Ultimately you have multiple regions set up where you can
Fillover, but we have some that deploy to a single region.
For the users who deploy to a single region, we have a backup
So we restore on your behalf to another region.
So these are turned on automatically.
>> Encryption, is there an option to
Encrypt it. >> When you store data, we store
It and then in transit, we
Enforce it. We don't believe you should ever
Turn it off. We believe it should be a
Default option. >> Okay.
Thank you. >> I noticed about a month ago
In the portal you must have released changes where it will
Warn you if you're exceeding your through-put, you can click
A link and give you suggestion and scale up.
I wonder if you're on the road map to go to the next stage and
Have auto scaling ability.
>> We have auto scale as part of our road map.
What we want to do is be very responsible. As we dial up and down, number
One, we want to make sure we guarantee our slas around
Through-put and lay we want to make sure there are
No surprises from a user perspective if we change it up
And down. We're still thinking through
About how to solve these problems, but this is something
We're very seriously thinking about. >> Okay.
Thank you. >> Hello.
We currently use cosmos as a dot db replacement.
If you want to do a large graph db, have you compared that on
Amazon as a past service? >> yes.
What we do for context for everyone, we have a graph api --
Gremlin api. We've made a bet on gremlin.
It's an open-source program. It's the wild west.
What the apache foundation has done is it wants to standardize
Protocol as well as a language.
Think of tinker pop as
Gremlin is to sql.
We've made a bet on gremlin. We're a scale out system as
Opposed to a scale-up system. If you look at other graph
Databases, they're future rich, but where they're lacking is
Partition storing. How do you manage an extremely
Large graph where cosmos db really has partitioned itself
For a massive scale graph.
We actually do have a session tomorrow afternoon specifically
Dedicated to doing scaleout of graph.
I highly encourage you to go check that out.
>> I have two question. I have the ability to disable a
Collection in cosmos in case i'm using a dev database so it
Doesn't get charged with the azure spend?
>> For now, azure cosmos db, it will give you a range of how far
You can dial it down. You can't turn it off completely
Because it's not a blob store just working in isolation but
Rather has a database piece of software running live for you.
With that said, we're always pushing the envelope to decrease
That footprint more and more. Then we also have techniques on
How you can do both archival if
O a tiering approach. >> Is there a way to set up an
Index policy to do a range scan for query date/times?
I've had to get around that by putting
It into a unix time.
>> It's actually a matter of systems.
For date/times, to do range inquiries, that's how you do it today.
With that said, because we are making a big bet on open-source
Apis where there's date/time support as a data type, we're
Expanding our in-depths to support different type systems
As well. Look for that as a future
Improvement so you don't have to do any of these work-arounds.
>> Several questions. First, you're talking about
Multiple partitioning, but my understanding is that's only one
Partition that can be used, right?
One connection. What exactly does it mean
Multiple partition? are you talking about multiple
Names as a partition or this mechanism to make sure the
Partition is equally distributed. >> It has two means.
One is called gateway mode. The other is called direct mode.
In gateway mode, we proxy the connection through a gateway
Server, a front-end server. >> That's a request.
Be you're talking about connection partition.
You're talking about multiple partitioning within your connection.
How is it possible? right now, actually you have to
Set up a partition.
So in that case, if you decide a
Partition, that's kind of fixed it. >> We have two ways of connecting.
The gateway will process the request.
We allow you to connect a server to a replica.
We're opening up a connection for each of these replicas.
The moment our partition is expanding to two child
Partitions there's actually a back and forth dance where we'll
Notify the client. The client will get the updated
Topoology and this is handled from the user perspective, all
This partitioning crap gets abstracted away as much as
Possible. >> Okay.
Second question is for right now, graph db, why you look at
The schema, it saves as a list of doc
I'm not able to query this to api. It's not compatible.
>> Two apis on the same connection.
>> It saves as a json. Those two things can be
Compatible as well? >> it's saved under an adam
Record sequence. That's a super set of all of
These models we project that as
A piece of json. We co-locate edges on the source
Vertise. When you want to interact
Between the data set and the api.
We have people who do this. Documents are treated as
Vertises in the graph. You can do relational queries
On top of traversing these.
We have one query engine that's hosted on the replica itself.
That's what the sql api is talking to.
That's what is able to do queries across multiple
Partition sets. >> Okay.
Thank you. >> So question about cost
Management, as somebody else alluded to.
We're dealing with a couple of different microservices and
Rather than having a separate collection for each service,
Given everything is indexed, it's feasible, it seems, to have
A more generic collection where you have multiple services
Within a team sharing that collection. Just wanted to get your thoughts
On that trade-off and when to not do that.
But from a cost management perspective, you get all the rus
Divvied up across those multiple services in that common collection.
>> I think it comes down to whether the data itself is -- is
That same data being shared across the microservices or not?
At the end of the day, we still do advocate combining different
Entity types. The reason why is this gives you
Benefits for queries. Rather than doing some type of
Relationship, you can actually co-locate the properties across
Different entity types to query them back in a single request
Efficiently as well as you can do transactions or across
Different types of records. With that said, there are also
Scenarios where you have very, very distinct data sets that you
Really don't care about transactions. You don't care about coming back
From the same query. It's really what the one option
Is using the new database through-put.
They share with each other to through-put through-put manage
The cost. If you have a star schema,
Sometimes this one spikes. You can share between these two different collections.
>> Hi. Currently i use raven db.
Along the lines of the you can share the
Rus across the collections, raven will create a
Container for that automatically for that type of document and
Split them out that way. Are there any plans of having a
Support for that in dot net driver? >> our dot net driver tries to
Move away from this whole thought process of type.
That's because we are a scale-out database that supports
Multiple record transactions.
With that said, from a driver perspective, it just actually
Views these as records. So you can sterilize any kind of poco.
Let's say it's different types of poco.
Let's say you're using the sql api
Api. As long as it's valid json,
Assuming you're using dot net, i will say, okay, i know how i've
Done this with other pocos. When i cast it out, i can do it accordingly.
From a best practice standpoint we advise this but should you
Want to do a type of collection, we're not going to block you
From doing this. In fact, some of this new
Functionality is really centered around people who want to do this.
What ends up happening is if you create many, many collections,
What you do is find out one has most of the request volume while
You have other smaller name spaces that are hardly used.
What's really goal is with database
Through-put, you can use both at a collection level
And have it shared across multiple collections.
So, for that, one collection where you have a ton of
Through-put, you can give that a dedicated collection. All these other smaller clack --
Collections, you can pull it through together.
>> The weight limit is one of the challenges.
We handle it on a reciprocal return. With the bulk of this,
Everything is abstracted, like we don't even have to consider
That errors.
>> They're handled internally. If you consider rate limiting or
You have a request time-out or service unavailable, all of
Those are handled within a bulk
Execution error. >> And, finally, without of
Thousand regard, i have 100 failed regards, do i get a
Summary of those hundred or how should we handle those?
>> So the one thing we catch is like if you have a bad document
-- If you look at the api, we return a list of bad input
Documents which are basically if you have an id null or any such
Things, we don't fail the whole job.
We don't fail the whole api, but return the list of documents
That caused the bad exception. We throw the exception and you
Will have to retry the whole a api.
>> Is there plans so i can go and just import a file.
>> The library is more for if you want to add that as part of
Your core service, so something where, like, for example,
Service itself. We have
-- Azure data factory, you can actually set up a nice
Visual pipeline and all of that. >> Do you have plans for bulk
Delete, like a purge. So
Table structure, just bulk delete the data, right?
>> That's in preview. Bulk delete is in preview.
So i can say -- >> tentative time when that might be available?
>> It's already in preview, so it's just that it's not part of
Our public nugget.
>> Reach out to us, we'll be happy to tell you how to use it.
>> Indexing multiple fields, do you have any status on that?
>> So by default, if you consider
1 Kill -- kilo
Byte, this is the default.
As your document gets bigger, the audio consumption increases,
But it's not linear. If you go from one kb to 100 kb,
It doesn't mean you're going to consume 100 times audios.
>> Is there a cost estimate? like this kind of operation is
Going to have x cost. Right?
>> We have online -- think of it as the number of bytes grows,
The io grows as well as the number of terms.
If you have more and more properties getting indexed,
We'll spin more cpu cycles generating those terms. You can tune the index policy to
Select what you want indexed.
It's directly proportionate to the io.
>> So question for the rest of the room was: what about
Storage? it's purely a measurement of
Through-put and not storage. Through-put is rus per second
And that's how many kilo bytes
You have on the system. >> You have a metric for both
Index using. Instead of indexing a hundred
Fields, you will index 10.
You will see an increase. >> I have
Two questions. Mongo db has a certain kind of
Support. Mongo api with cosmos db, does
Sim support or no? >> could you repeat the
Question? >> where is the issue?
>> The concordancy.
>> What do you mean? you have two user to update a
Sim record. >> What cosmos db does is use
Optimistic concurrence control. Each record has an e-tag.
Every time you modify a record, the e-tag value
Updates. If the e-tag matches on the time
Of update, that means nothing else has modified the record.
If it mismatches, that means you have detected a concurrent
Violation. We do updates on that e-tag
Value. That way we remove the mechanism.
Only if that e-tag mismatched should it update.
>> Another issue is does cosmos db support
Change tracking? >> change tracking, we have a
Concept of change feed. It's not an operation log.
Every time you insert or update a record, what we're doing, we
Have a structure we're appending to. This is how we build the
Database. The change feed is a direct feed
Based off of that log sequence number. If you want to get reads off the
Database, use the change feed. It's a very, very efficient way
Of getting reads. Only new record, no existing records.
If you're looking at building an event pattern, we have a ton of
People doing event sourcing scenarios off of that change
Feed. >> Okay. Thank you.
>> So being able to create a femoral database is for the
Purpose of integrational testing on cid service
