Good morning. Everyone Tim Sander.
I'm a program manager on the Azure Cosmos.
DB Engineering team and I hope everyone
had their coffee and are awake today.
We're going to be building an
event driven app together.
Using Azure Cosmos.
DB an Azure functions.
Just to show a hands who's used Azure.
Cosmos DB before all right good.
Could use functions before?
Perfect so this is a level 300 sessions.
So we're going to skip a lot of the basics
and really dive into the interesting stuff.
Today will be focusing on the change feed
and Cosmos DB and how this integrates
really nicely with Azure functions.
We're going to build together
throughout this 45 minute session,
a sample app for made up
airline called Contoso Airlines.
The first thing will do with our app
is ingest data in real time and we're
going to ingest data for Contoso
Airlines fleet of airplanes were
going to ingest their location data.
So basically a document that has
a latitude longitude altitude and
things like that of the plane.
Once we've ingested that data into Cosmos.
DB we're going to go and use it to send
alerts if a plane enters a no fly zone.
And then display a real time map of
plane locations and display an upcoming
arrivals board for each airport to show
the upcoming arrivals that are coming soon.
So our first step is just to
figure out how do we want to ingest
all this data into cosmos DB?
Now, when you ingest data into Cosmos DB.
It's going to be a Jason document
so you likely have some sort of on.
Prem technology that would
initially pre process.
The data and eventually maybe send
it to a function that goes and
writes it to cosmos DB and you'd
end up with something like this.
JSON document that has a unique
good for the plane.
The timestamp that the location data
was sent the planes current altitude.
The latitude in lanja tude
destination and so on,
so this is a plane going from Seattle
to Orlando Airport and it's a redeye,
leaving at 10:00 PM arriving at 7:00 AM.
So this will be ingested into
basically our locations container
and the first thing that we do.
Anytime we create a container in Cosmos,
DB as you all probably know is
pick your partition key and this
is absolutely critical for an iot
application like this that's going to
need to scale pretty significantly.
So I'm sure since you values.
Cosmos DB 4.
You're pretty familiar with the
characteristics that make a good partition.
Key you want your partition key
to have a high cardinality and it
should also do a really good job
at evenly distributing reads and
writes and it's helpful in a lot
of cases for read heavy workloads.
If the partition.
Key is included in the filter of
your most frequent query right.
Is this container the locations container?
Where we're ingesting this data?
Is it read heavy or is it right heavy?
1st question of the day
and I have some prizes.
So volunteer right, yeah,
right heavy try my luck at their own alright.
I was a strike perfect.
So it's right heavy so because of that.
We don't really care about the partition.
Keys value in issuing queries right
so we want to partition key for this
container were ingesting data to satisfy
these 3 properties so question 2.
What should it be departure.
Airport arrival airport.
The airplane's tail number or a unique GUI.
For every single basically piece of
data that we ingest into Cosmos DB.
Yeah. The tail number,
so the tail number is a good choice.
It would do a pretty good job at
evenly balancing requests in storage?
What we really want here is
a lot of cardinality.
So we only care about right here.
We want something that will
have a high cardinality,
so that when we ingest data in Cosmos.
DB the total size of Cosmos.
DB can grow really large because
each partition key will be smaller.
We can just add a more partitions.
Each partition in Cosmos.
DB can grow up to 10 GB so we want to
keep each individual partition key small.
Arrival airport arrival airport I
guess that same cardinality problem.
Yeah,
the idea is something that's unique
for each document now whoever
shouted that come meet me after
I have another prize for you.
It's too far to throw the ID
is a really awesome property.
In this disk partition by not
obviously provides absolutely
no value for issuing queries,
but it's really awesome for
guaranteeing a high cardinality as
we ingest data so each data point
that we ingest will have a unique ID
property so will evenly distribute
through put in storage and be
able to ingest data quite well.
Small partition on that an as
we ingest data in the Cosmos.
DB will automatically store it behind
the scenes based on this property.
Here's a super high level view
of our architecture here.
Of course, if you were actually building.
This there would be a lot more in
architecture in this cosmos DB and functions,
but to keep it simple since we have 45
minutes will focus on these components
are going to be ingesting data into
the locations container directly and
then have different microservices,
which in this case are just
going to be Azure functions.
Go and subscribe to the Cosmos.
DB change fee to then go in each do
a separate independent small tasks.
The first one,
will focus on today on building
is the microservice,
which sends an alert if an
airplane enters a no fly zone.
So this is the logic that will use
to decide if the airplane is in
the no fly zone with a very simple.
We've dumbed it down quite a bit since.
This is just an app for
building in 45 minutes.
Any guess on what location this is.
This is the Convention Center,
so if the plane comes close to the
Convention Center will send out an alert
that it's in a no fly zone reality.
This would probably be things.
Like other airports and stuff like that,
but to keep it simple.
This is the logic that we use
will send out an email alert to
let's say air traffic control in
the airline and stuff like that.
I'm going to start off with a demo
and then kind of take it backwards
and deconstruct how we made the demo.
Perfect so I'm going to start off
a data generator that goes and
generates basically sample airplane.
Telemetry so it's just going to be
that the same documents the locations.
It's going to be writing them
to our locations container.
I'm going to go and start my email
alert function and I'm actually
just running this locally right
now reason for that is,
we're going to be modifying.
This a lot throughout the session today.
So I want to be able to quickly
modify it and get it started
again so that will get started.
Of course, it's really easy to
deploy functions, but will run
it locally now just to keep the
development experience quick.
Next thing I'm going to do is come
to cosmos DB and take a look at
my cosmos DB locations container.
There we go. If I come
to scale settings here.
I can see that my browser froze.
There we go cool.
Like I'm scaling settings here,
I can see that I partitioned
on my ID value here,
so I'm adjusting data based on the
ID value and storing it behind the
scenes based off that now note here.
I provision this cosmos.
DB container with autopilot mode.
So when I set it up.
I just decided,
I wanted a locations container to
collect Telemetry and partition an ID,
but I actually didn't have to think
at all about the scale of that.
I simply set up this container with
autopilot provisioned throughput
and it's going to automatically
scale between 10,000 and 100,000
are used for me and I won't need
to think about this at all.
I just picked that really wide
range that I want to scale between.
It's really awesome because
I'm going to be running.
This demo today and be pumping a lot
of data into the locations container.
But after I'm done with this demo,
it might be a few weeks before
I run it again?
So Cosmos DB is automatically going to
scale down for you to that lower level,
so it's truly serverless,
you don't have to think about scale.
I come to my items here,
I can see the sample documents
that I've uploaded into Cosmos.
DB so these have been written
to cosmos DB again.
It's just those sample data points,
so this is a flight going
from San Diego to Orlando.
So I have a logic app.
That's connected to that Azure function,
so every single time a
plane enters a no fly zone.
I'm going to trigger my logic app,
which is going to then go
and send an email to me,
so the function has been running.
For quite awhile and every
time you see email alert here.
It's not actually sending out an email alert.
This is just every time the
function runs the output that
email alert function ran and there
was another right to Cosmos DB.
Hopefully,
I got one email perfect I got one at 9:22.
And I get an email that Contoso
Airlines flight 621 entered
the no fly zone near Orlando.
Let's take a look at how that works behind
the scenes and kind of walk through code.
I was actually really simple to implement
this when I created my cosmos DB container.
The Partition Key was really important.
But other than that there
really wasn't a lot.
I had to think about same goes
for the function functions
automatically read from the Cosmos.
DB change feed and I actually really
didn't have to think a lot about how
that worked and I also didn't have
to think about this function scale.
This function uses Azure.
Cosmos DB trigger so every single time.
We do a right or an update to the
locations container were going to trigger
the logic in this function will come.
And loop through all the changes here
and every time there is a change if the
aircraft is in the no fly zone will
go and execute the logic down here.
Now the logic here really simple if
the aircraft is in the no fly zone.
We go and build a payload that
will then go sent to a logic app,
which we trigger through this basically
line of code down here and then
that goes and sends out an email.
So one down to to go,
we've alerted basically our customers and
air traffic control if the airplane entered.
A no fly zone and we did that through a
cosmos DB trigger using Azure functions.
Now Azure functions behind the
scenes was obviously as most of you
likely know using the Azure Cosmos.
DB change feed.
Now the change feeding Cosmos.
DB is my absolute favorite
feature in the product.
You want to think of the change
feed as basically a list of
documents per partition in Cosmos.
DB by the order in which they
were modified so in Cosmos.
DB if you're doing rights to Cosmos.
DB like we're doing here,
you can just read from the change
for you to basically get a stream
of the latest updates and rights
that were done to Cosmos DB.
There is one change feed
per partition in Cosmos.
DB so this is what allows you
to scale really well as you
basically have many partitions a
lot of throughput an more data.
You read from the change feed using
the change feed processor library and
you probably noticed I wasn't using
the change preprocessor library to
read from it with that function Azure.
Functions is going to automatically
use the change for your processor
library for you and you don't
really actually have to worry.
A lot about how it works,
or every single time.
There's a right to Azure Cosmos.
DB will go and trigger the
logic in the Azure function.
As you saw and send the email
if we meet the criteria.
But since this is at Level 300 session.
I do want to talk more about this
how this change preprocessor
library works behind the scenes.
Now of course.
Today we use the processor library
through functions an the benefit
of functions is its server list.
So we don't have to worry about scale.
We just ran the function.
We can go deploy it.
Anne it'll scale automatically
but if you want more control
over scale and you want.
Some more control over
reading from the change feed.
You can go take the change preprocessor
library and go deploy it on your
own so wherever you want to host it,
you can do that as well.
And the great thing about the
change preprocessor library,
no matter how you're using it is.
It guarantees at least 1 semantics
guarantees fault tolerant reading
of the Cosmos.
DB change feed so this means if you're doing,
writes the cosmos DB in this case.
If we're doing the right
server location data.
We are guaranteed to process every
single right at least 1 so this means
we're never going to accidentally
miss processing and airplanes,
location,
so will guarantee at least once a
metrics there and that's really
critical for mission mission critical
apps that need need that guarantee.
The polling of the change feed in
the management of your checkpoints
for where you are in reading from
the change feed is automatically
managed by the change feed processor
library for you. In addition,
Cosmos DB is going to have different
partitions right behind the scenes and
the change preprocessor library is going
to automatically read these different
partitions in parallel functions
is going to do this scaling up out
automatically for you so as Cosmos.
DB has more data and as you're
ingesting more data functions will
read these partitions in parallel.
Now it wasn't too important for this
scenario, but in a lot of cases.
We actually need some guarantees about
the order in which we process documents.
You might want to do not
really critical here.
But you might want to have a guarantee
that says as I'm writing documents
for different planes for a specific.
Let's say tail number.
I want to guarantee that I never process.
A document that was written later before
a document that was written earlier.
You want to guarantee that for a certain
plane if you write do right sabian.
See ya'll process them in that exact
same order and the way to guarantee that
is by partitioning on that property.
We guarantee that within a
particular partition.
We're going to process the changes
in the order that they were written.
The way the polling works behind the
scenes is actually really cool through
the developer into the end user.
It's essentially a push model
every time there is a change.
We go and we trigger that Azure
function to send the email alert.
So functions is going to automatically
check the locations container for changes.
If it doesn't find any changes.
It's going to wait a specified interval
that you specify and you don't have to worry
about setting this up in the logic here.
It just does this behind the scenes.
If it goes in finds a change if
there's a right of a document
through the locations container.
Will go and process that and then
immediately after we're done processing.
It will check for a new change and because
there's no wait after you after you process.
The document because there's no
delay until you check again.
This allows you to really stream
data in real time into Cosmos.
DB and do a really good job at saturating
your throughput and ensuring your
processing all these changes in real time.
Now,
after we go and process document be.
We're going to immediately check again,
but we don't find anything.
If we don't find anything will wait
that's specified feed poll interval
before we check again and there is
a very small cost every time you
check up partitions change feed
where there's no data there.
So pretty trivial.
But there is a non 0 cost for that.
So you have control over that people
interval and how long you want to
wait before basically the times.
You check after waiting will
check again refine see,
we process that and basically it
continues all this is managed.
Automatically, for you behind the scenes.
You don't have to set this up when you
use a function basically all you need
to know is you can pick that people interval,
but every time there's a write your
function will be triggered with the
maximum delay of that feed poll interval.
Excellent that will do is build out that
map with the current airplane locations.
So we built one of our microservices
that goes in alerts if an airplane
enters a no fly zone.
We're now going to build out the
next one that displays a real
time map of plane locations.
So we have 2 options for building this.
We could go and recalculate
the latest airplane location.
Every single time someone refreshes
the Web app.
Or we could build a materialized view
that has the current location of every
airplane and then every single time.
There's a web app that needs to reference.
This data will just go and call that
so we're not going to recalculate
it every time but will have that
value persistent somewhere with the
materialized views are that different
services that take a dependency on
this data can simply go and access
that so this will be a sneak preview
of a question that will think about in
a couple minutes.
Now, if we recalculate the materialized view.
I'm sorry. We calculate the basically
current location of each plane
as we do the rights to Cosmos.
DB we're going to do it with this query
against the locations container and
there's some alternatives to this query.
You could probably do like a group buy
and find the Max basically Max timestamp.
For every plane and have that
be the location,
but this is one possible alternative
that will consider where you basically
do an order by for a particular flight
number and take the Top value from that.
Now, how many times will we need
to run this query per second.
Will it be run a lot or not too often?
Really quite a bit going to conservatively
estimate 10,000 times per second.
It's going to be run once per plane
per every basically web app or
service that calls into our cosmos.
DB container so every time we want to
display the location for particular
plane will have to go and query.
The entire location history, an issue.
This query here.
This query is going to be across
partition query right.
We partitioned an ID that provided
a lot of value for doing rights,
but it didn't really provide
a lot of value for doing.
Reeds in this case will need to do a
filter in an order by we actually could
optimize this with the Composite Index.
In Cosmos DB.
But as you'll see through this demo now
to actually quite an expensive query.
I'm going to open up the data Explorer here.
And come to my locations container
and issue a new SQL query.
So again I can just issue query simply
right through the portal and I'm going to
type out that query select Top one from C.
Where C dot flight number?
Sequel to control so
airlines 16 will just say,
and then do an order by C dot timestamp.
So the idea is every time our web app wanted
to obtain the current location for a plane.
It would have to go an issue.
This query and the query
runs it's pretty fast.
And it just outputs that single document
that has that plane in its latest timestamp
so this is an aircraft going from London.
Heathrow to Philly and this
is its current location,
longitude latitude speed and altitude.
Let me go and we look at the are you
charge we find it takes about 35 R use now.
This is fine.
If this is an ad hoc query.
You need to run every now and then
but this is going to be something
that's really central in our
application that tons of different
web apps out there are going to go.
Ann call Ann reference so we could
have maybe dashboards displayed on
websites or air traffic control
could be really frequently running
this query for each plane so we
want to make this as optimized as
possible so while initially our
rights to locations container that
was a very write heavy workload.
This scenario is quite a bit more read
heavy so let's think about what we
could do to optimize optimize that.
We have 2 options for building this.
We could re calculate the latest
airplane location each time
someone refreshes the Web app.
Or we could build a materialized view that
basically persists the current location.
Of each airplane.
Which do you think we should do?
Who sings A? Things be OK good,
I don't know if I sold it well enough
if you all set a I would have just
kinda given up at that point perfect.
So it's be will build the
materialized view with basically the
latest location of each aircraft.
Now, who's familiar with the
materialized view concept.
That's about half the room,
so it's basically really simple.
We're basically taking the results of
a query and persisting that somewhere
so if you find you need to run a
query repeatedly over and over and
over again and it's the same query.
The idea is you just precompute
the results of that query and
then go and store it somewhere.
So then every time you need to
access basically run that query.
You can just go and do a simple.
Look up for the results instead of
repeatedly needing to run the query
over and over again in many cases,
the materialized.
View might compute something
like an aggregate.
Or provide a subset of the
data so the idea is to optimize
for reads an quicker lookups.
You can process in build this materialized.
View by reading from the change
feed and we can do that,
really easily using an Azure function.
That's the idea here is we can
display real time map of plane
locations in anticipated arrivals
precalculate that materialized.
View in the function.
Every time there's a right to
the locations container then
we'll go in persist that.
And a container called our current
locations container an any web app
or any other service that needs
to go in access that information
is going to go and call directly
into this they're not going to need
to go and never use the locations
container directly now while we were
reading from the change feed for the
locations container and we're doing
that for a really frequent queries.
There are likely a lot of instances
where you would want to go and look at.
Looking for the location
materialized view. No worries.
Perfect thank you. Hum.
I don't think it's on yet right.
It's an perfect well,
I was going to shout and then.
I think my rating would
go down by a solid star.
After that, after showed it's
always something always goes
wrong in sessions like this.
The one thing that I have no
backup plan for is the Wi-Fi.
So that's been great,
so Mike so I can shout Wi-Fi you're kinda.
In a bad spot so we'll keep going.
So I do here is we built that
materialized view in the current
locations container and we'll
just call into that directly.
Let's actually take a look at the
cost for this so I don't think there
is any one that said option a re
calculate this value each time.
But if you were developer that shows
option A and went into this you would
probably not be a developer anymore.
You probably would be out on the
market looking for a new job because
doing that would require about 350.
One thousand argues this is a
very rough estimate of course,
you probably would have way
more than 10,000 times.
You need to look up each point per second.
But in total.
This is going to consume 350,
one thousand are you so.
Really quite a bit going to be very,
very expensive now with the
materialized view option.
We're going to pay extra for each right.
So we'll be doing rights to the
location container and then going
calculating the most recent
value and going and updating the
current locations container and
this will obviously use more use
than before because we're doing
double the number of writes.
However,
instead of each read costing 35 R
use they cost one are you because
we're going to do something in Cosmos
DB that's called a point look up.
And Cosmos DB appoint look up is when
you go and look up a document by
its ID value. So queries in Cosmos.
DB if you write an efficient query.
It will be pretty fast and pretty efficient.
But nothing in Cosmos.
DB is faster than just doing a point.
Look up where you have the documents
ID that key and then you go and
get the value.
So we can improve our application pretty
significantly with this materialized.
View so let's actually take a
look at how we would build that
and walk through some code.
So I've had the data generator
actually pumping out data for quite
awhile and I've stopped that earlier
function and I'm going to start
this new function, which goes and
computes the materialized view.
So let that run for a little bit
and jump into the Azure portal.
So obviously the alternative to
running this query over and over
again is to go instead of looking
at the locations container go
and access the items that are in
the current locations container.
So when I access that there we go.
When I access the current location.
Steiner I'll basically going to get
every document by its ID value.
So when I come here,
I can do a simple look up.
That's always fun there, we go cool,
so it must have been 'cause I didn't have the
web page open for awhile when I come here,
I can do a simple point.
Look up of the document and I'm
going to access it by its ID value.
So when I want to access this container.
I'm going to do what's
called a point reading.
Cosmos DB and I'm going to
supply the ID value in read item.
Async basically that call.
The Cosmos DB and I'm going to
supply the Contoso Airlines,
905 ID value to then go and do that.
Look up.
I'm going to be running this
very frequently and if I do this
called the cosmos DB because
this is such a small document.
It's going to use about one are you in total.
What would be a good partition key
for this current locations container?
Any guess.
Yeah,
the flight number right the flight
number is really nice because
that's going to be something
will know about right.
If we're looking for a specific
planes flight number will have it
and we can actually pass that into
the call the read async call and
that's going to make that really
more performance will be able to
target a specific partition.
So we obviously have this container.
Also on autopilot mode and we very
intelligently chose flight number
to be the partition key here.
So this has been running for awhile now.
And been pumping out more data.
Now, right now, I have 2 microservices
running why stop the other one,
but I had it running before and each
one has a separate lease bookmark for
where it was in reading the change
fee so will take a look at that now.
This leases container wasn't
here when I started my demo.
If anyone was paying attention carefully.
I started out with three separate cosmos.
DB containers,
but this lease container wasn't there.
The leases container basically has
your bookmarks for where you were
in last reading the change feed.
We're going to keep one bookmark per
partition per basically microservice
that reads from the change feed
so you don't actually need to go
and manage the least collection,
but we're going to take a look
at what's in there.
If I look at this one document here
it's basically a bookmark for a
particular partition for my alert.
Micro service so that email microservice.
I went and send out emails.
This is a bookmark when I stop
my change feed.
The change meat processor library
went in bookmark where I last
was and reading partition.
It's changes and I bookmark that so if I
were to go and start it again later on,
I can just pick up where I left off.
This is stored in just another cosmos.
DB container so very easy to manage
and functions will actually go
and set up this leases container.
Automatically, for you in general,
you actually don't need very
much throughput on this.
I have 400 argues provisioned
even really large scale workloads
using the change feed aren't going
to generally need more than a
few thousand are used for this.
I actually go and prefix.
the ID value here with something
unique for each of my micro
services so if you notice.
This is pretty fixed with alert
by come down and look at the
other documents they're going
to be fixed with other values.
I've run my alert microservice.
But I've also run the current location.
Microservice an eye store,
these in the same least collection.
I'm able to distinguish which is which
by that prefix that I put before
basically the cosmos DB collections.
Let's take a look at the code
now go a little deeper.
Take a look at that
materialized view function.
And come down here, so the logic
in this function is pretty simple.
Just like with the email
alert Azure function.
Every single time.
There's a right to Cosmos.
DB we're going to go in
trigger this every single time.
There's a right to the locations
container will trigger this
basically tiny micro service here.
Every time there's a change.
We're going to loop through all
the changes go and take the flight
number and make it the ID value of
the document and then gohan right
that an persist it in the current
locations container so the logic
here is pretty simple and the great
thing is it's completely independent
from our other microservices.
The one that sent alerts for new emails.
That list collection prefix the current
location is specified right here.
So you specify that separately
for each function and this is how
we're able to distinguish different
functions bookmarks from one another.
So we have again separate bookmarks
per microservice and each of
these bookmarks are completely
independent from one another.
They're stored in Cosmos.
DB so they're stored in a
high availability service.
So you're never going to
lose these bookmarks.
They're going to be available
with 5 nines availability and
this is what makes the change feed
processor library incredibly robust.
Let's imagine this scenario.
I have 3 partitions.
When I started today.
I created one microservice
that has a bookmark for which
document its last processed and
as it processes more documents.
Per partition,
it's going to go and progress further
down in reading from the change feed.
Later on in the session,
I added microservice too.
The one that went and sent out
basically the materialized.
View with every time we did it
right and precomputed materialized.
View and persisted that back into Cosmos DB.
This micro service has separate
and independent bookmarks.
As Microservice,
one moves along or if I need to
stop a certain microservice.
These can each persist in continue to
run independently from one another and there,
only dependency is an Azure.
Cosmos DB which is a highly
available highly scalable service.
So 2 down 1 to go,
we built out two of the Micro
Services for application.
We send alerts if an airplane enters
in no fly zone and we also display
a real-time materialized view
of different airplanes location,
we go in call into with a web app.
Next thing we're going to do is
display the upcoming arrivals board,
so you want to think of this is basically
their arrivals board in the airport
when you go when you look at all.
The planes that are going
to land in Orlando Airport.
It's going to basically
show that list sorted now.
This arrivals board might not be
limited to being displayed in airports,
it might be displayed in air
traffic control towers so.
Basically think of this as something
that also just like our map service is
going to be accessed very frequently.
Now, hopefully this is an easy question,
we have 2 options for building out.
This arrivals board.
We can re compute what the arrivals
board is every time we need to
go an basically get that value or
we can build out a materialized
view that goes in persists.
The value of the arrivals board,
so that we don't need to go and
repeatedly calculate it every
time a service needs to access it.
Which one do you think it is
you can just shut it out?
It's be right really,
really simple we can do this?
The same way as before now.
Of course will need to
do more processing here.
Our function is going to be a little
bit more complex so in this case,
we're likely going to want to.
Have that same strategy where
we make that ID value in this
document something that is really
something we can use to look up
on so we put MCO as the ID value.
So if we want to do a read
item async in Cosmos.
DB we can.
Look up this document by its ID value,
which is MCO.
Orlando airport and they go in supply.
The airport partition key so will
basically know which partition
it's in and we know the ID value.
So this read of Cosmos.
DB is just going to cost one are you.
Now in this case,
we're going to be re computing.
It every single time we run
our Azure function.
Now,
for a lot of iot applications or a
lot of retail applications that use
the change feed and basically build a
materialized view that's real time.
They cannot afford to have the materialized.
View really lag behind by more
than a few seconds if for example,
you had a gaming leaderboard.
It has to remain pretty close
to being perfectly up-to-date.
It can't really lag behind by an
hour if it does people aren't
going to want to play your game
leaderboard won't be up to date.
Now in this case doesn't matter that
the arrival board is perfectly up to date.
It might we don't know likely not
in practice for this example.
We assume that everything had to
be in real time and the location
of each plane had to be really
precise so for the map.
It was important that it was in
real time in this case for the
arrivals dashboard.
If it needs to be in real time.
Then you'd want to go
an do that materialized.
View build that materialized.
View using the change fee,
and every time we do a right to
the locations container go and
rebuild that materialized view,
but of course,
if this doesn't need to be in
real time you could have some
sort of job run separate from
the change for you and go.
And just issue a query every 5
minutes and then go persist that
value somewhere that would be
another alternative if you do not
have as real time requirements.
So we'll partition on airport
for a really quick.
Look up sand same as before
will go and persist.
This value in the arrivals container so
any web app that needs to go in access.
This is going to go call into
arrivals container specifically we
can issue ad hoc queries against the
locations container but really anything.
We need to go an access for our
upcoming rivals is going to go
right to the arrivals container
for one are you really quick.
So we're up with successful we
built out an app that ingested
data from different airplanes.
We built that 3 microservices,
which all did separate and independent
tasks to go and basically.
Filled out our business software
for our airline.
After a few years of running the
app that was super successful.
We have accumulated over 10
terabytes of data.
We've been just a lot of data into the
locations container and we just left.
It there are airline is required to
keep playing locations past history
for an infinite amount of time.
So we can never delete.
This data,
so the development team decided to
keep it all in that same cosmos to be
container and after a couple of years,
it grew to 10 terabytes.
What should we do?
Any ideas.
Yeah. Move it to blob storage exactly
we have, we're using the change.
We very heavily and we have 3 microservices
Here that I've been running but we
want to add on another microservice,
another Azure function.
That's going to go basically go and read
from the change feed and then do a right
to Azure BLOB storage will process.
The changes in the right side under
Cosmos DB and key blob storage
basically synchronized with Cosmos.
DB will then go set a time to
live or TTL property on Cosmos.
DB of say, 60 days so every time
a document written in Cosmos DB.
The countdown begins after 60 days.
It will automatically be deleted
from Cosmos DB.
But all have retained the data forever in
blob storage and it will be much cheaper.
That way as well.
If you have let's say hundreds of
Terabytes or petabytes of data.
So not only well in our app is successful,
and for a while we don't really do that much.
Until the CEO's son of Contoso Airlines,
CEO his son really wants a
software engineering internship,
so he's not a very good developer.
But we decided to hire him and he's
messed up a lot of other pieces of
code that we have so we decided we're
going to get something really simple.
We're going to have him update
that email alert.
Microservice instead of sending
alert for no fly zone over the
Orlando Convention Center were going
to update it to send alerts if you
enter the no fly zone for example,
near Orlando Airport. Instead.
Let's go take a sneak peek
into the changes that he made.
Now, with this in turn did
was he came into the email.
Functions email alert function
and he went and obviously when
it look forward the no fly zone.
Here,
the business logic that calculates
whether the plane is in the no fly zone
and he went and he made some changes,
he couldn't get the email alert to
work so he decided to basically
just go and make it really simple.
Just say Something like that.
So.
We don't need to be experienced
software engineers to know that
every time this runs.
It's going to say the plane is
in the no fly zone and of course,
he's been struggling his entire internship.
So he's really happy about this
so he doesn't even test it out.
He just goes and deploys it into production.
So if I come back.
I'm going to stop my current location.
Microservice I'm going to run that
email alert microservice.
Any guesses what's going to happen.
Will probably happen is outlook's going
to say you can't send anymore emails and
kind of lock me out for a little bit so
that will run and basically very shortly.
I'm going to be flooded with emails here
essentially this is going to cause a lot
of spam are going to be very unhappy right.
We're going to be sending alerts to
people that say the plane isn't a
no fly zone and things like that,
and that's really not ideal so we
obviously need to go and take down that
microservice very quickly so that's
basically going to run and process.
The changes as they are written in send
a ton of email so it's going to be.
Pumping out emails very, very shortly.
Now we obviously have to take that app down.
And when we do that,
we're going to really rely on that
separate leases concept so that
idea of having separate leases
for each container in Cosmos,
DB or for each function in Cosmos.
DB is super critical here.
Obviously, there was an issue now with
the email alert microservice so we
have to go and take that down right.
We need to stop that function and
stop it from sending out alerts but
we still want the current location and
we still want the other microservices
to continue to run as normal,
the great thing about this
micro services architecture is.
Each microservice only has
a dependency on Cosmos.
DB Cosmos DB is a high availability
of service.
That gives you 99.999% availability.
So you can take that dependency
on Cosmos DB to ensure that your
microservices architecture is.
Isn't as weak as its weakest link
whether that link is functions going
down or some intern deploying bad code?
Now couple of best practices about
reading from the cosmos DB change feed.
There is no cost for reading from the
change feed a lot of times customers think
the change feed is some premium Cosmos.
DB feature and that's
very far from the truth.
It's a feature that every customer
every Cosmos TV customer should really
know quite well in really heavily used.
It's really not much different
than a regular read from Cosmos DB.
And the change feed is enabled by default,
so if you were one of our first.
Cosmos DB customers years ago.
An you decided after the session.
You want to go and start reading
from the change for you,
you can go and do that,
if the document is still in your container.
It's still going to be persistent
in the change feed.
You can rewind the container to any
point you like whether it's beginning
or a specific time an read it that
functionality just as an example is
something that you don't get their
functions functions is designed as a
simple way to read from the change fee
so these simple micro services that we
built functions is going to be perfect.
For more advanced control and functionality.
I do recommend going and deploying the
change feed processor library yourself and
you'll get capabilities like that as well.
But all of this is very robust.
No matter how you use the change
processor change feed processor library.
You'll get those guarantees of high
availability in at least 1 semantics?
The change for you contains the latest
version of every document so in this case.
We were doing rights to Cosmos.
DB we weren't doing a lot of updates were
just doing right so if we were to do updates.
Only the latest version of the
document is going to be in the change
feed so that's really important
to remember when you're going,
and reading from the change for you
for really robustness and guaranteeing
your processing every single change.
You do want those changes to be rights.
Intermediate updates and Elites
obviously as I mentioned are not in the
change feed for deletes the pattern.
We recommend is setting is deleted
property in your document an when
you want to delete the document set
is deleted to true and then set a
document specific time to live or
TTL on that specific document so
it's going to be triggered.
It will trigger the change
feed figure Azure function.
When it's updated and then after a
specific amount of time say 24 hours.
It will be deleted from Cosmos
DB and elite will be processed.
Now we didn't talk at all about
handling errors.
Today, just because we only had 45 minutes,
but this is obviously something
that's incredibly important when
you're reading from the change made
specially if you're doing something
like sending out email alerts,
or going and processing documents and
writing them to another data source
like Azure Blob storage if for example,
we are going to have some error or
exception in our function code.
We would surely want to do that,
try catch pattern where we catch the
error an ghost or what document failed
and store it in a queue somewhere,
so that we can go in later.
Are reprocessed that document
either manually or through some
other other code and go in perform
that necessary business logic.
This allows your function to
continue processing later documents,
but still retain which documents
were not processed correctly so that
you get the guarantee that you're
processing every document that's
written for your locations container.
The big idea that I want to emphasize
is when you're building out a
microservices architecture on your
reading from the change rate in Cosmos.
DB you want each specific function.
You have a separate and small
independent task so if I had wanted to.
In this demo.
I could have combined the email
alert microservice I could
have combined the materialized.
View Micro Services and the right to
Azure Blob storage micro services.
I could have combined those all
into one giant Azure function.
But then what would have happened
when we hired our intern.
We would have been basically
having to take our whole service
down so that's the advantage of
this micro services architecture.
And when you read from Cosmos.
DB change for you to event source
to these microservices you get an
incredibly robust in highly available
in highly scalable solution.
I hope you enjoy the session today.
These are some upcoming cosmos.
DB sessions to learn more definitely
recommend these will be led by my colleagues.
The change feed examples and best
practices from reading from the change
for you are all in our new workshop
materials so if you go to AKA dot.
Ms Cosmos, DB Workshop and you click there.
We have different labs available.
There is a change feed specific
lab that goes through the change
feed in depth and there's also
a scenario based iot lab,
which when you go through the lab
is going to look a lot similar
to the one that I ran except it's
going to be with cards instead of.
Planes so same best practices
and this allowed to get hands-on
with a lot of these concepts.
So thanks for attending today and
have a great rest of the night.
If I'm unable to answer your question now.
I'm actually going to be going
right to the cosmos DB booth.
After this and I'll be happy to
cover it there to incentivize
questions an for anyone that
answered the question correct today.
I have some socks so feel free to
stop by and if I don't give you.
