[MUSIC]
>> There's a ton of excitement
happening here at
Microsoft Build 2019,
and the Build live stage
is no exception.
What's new with Cosmos DB?
With Principal Group
Manager Software Engineer,
Kirill Gavrylyuk. Welcome.
>> Nice meeting you, Rutha.
Very great to be here.
>> Great. So for those in the
audience that are new to Cosmos DB,
what is it, and when
should one use it?
>> Cosmos DB is Microsoft's
globally distributed multi-model
database service.
>> Okay.
>> It is fully managed.
It offers you not only
availability guarantees,
but throughput and
low latency guarantees,
which is single-digit
millisecond latency.
Thus, it's a great choice
for any applications
that has large volume of
data that needs high-scale,
high-performance.
It is instantly elastic,
which allows you to save
a ton of money compared
to provisioning virtual machines.
It is infinitely scalable,
so you can go from 100 requests per
second to tens of millions
requests per second,
and at the same time,
it is schema-less, which is great for
developer productivity because
you can start with any data,
you can change it anytime,
you don't have to
pre-define schemas as as
we do in traditional
relational databases.
So it spans the spectrum from,
you can start small,
you can start for free,
and you can grow,
and you can grow to
tens of millions requests per
second for mission-critical apps.
>> Great. I'm used to
rational databases,
but I like the infinite
scale of Cosmos,
global distribution,
performance guarantees.
How should I go about
moving my data to Cosmos?
Then also, how should I
design my data model?
>> That's a great question.
First, of course,
if your data needs to
be highly normalized,
if you have very intensive queries,
relational databases are great.
Now, of course, if
you need high-scale,
if you need schemas,
if your data changes,
if your data goes to high velocity,
that's where you need
to start thinking
about other databases we offer.
When you move to Cosmos DB,
there are a few things, that we
have great recommendation about it,
by the way, and I'm not
going to go to all of
it in depth here, but you can.
It's important to think about, one,
denormalize your data so that
all the related data stays together.
Cosmos DB stores data in containers,
and containers are not tables.
There are much fewer
containers, typically,
and all the related datas that you
want to query at the same time,
or you want to work
with at the same time,
better to stay in a single container.
I can show you right
here, as a screen.
Here, I have earthquakes database
that stores earthquakes information.
As you can see, those actual trees,
they're not just tables,
so you have lots of nesting going on.
That's great when you want
to get in one request,
you want to get all the data at once,
you don't need to do
multiple request.
>> Then, also, how complete
is Cosmos SQL dialect?
Then, I look at "User Voice,"
when should we Accept,
Skip, and Take, Distinct,
and other SQL features?
>> That's a great question. Cosmos
DB is a multi-model database.
It offers various APIs,
starting from open source NoSQL APIs,
like MongoDB or Cassandra API
or Gremlin, and of course,
we offer a flavor of SQL,
which is a subset and
a superset of a SQL that
is tuned towards the semi-structured
potentially nested data.
It's a fairly complete coverage
when it comes to queries.
At Build, we're excited
to announce that we
finally have Skip and Take.
Awesome.
We have Offset in, and
so we have Distinct,
and Group By, is just
around the corner.
So we keep growing and growing.
The goal here is to offer
you full lines in SQL.
Of course, not T-SQL,
but full lines in SQL for
your queries and SQL statements.
>> Awesome. I've heard you announced
Apache Spark built in Cosmos DB.
What is that about?
And then what's the [inaudible]
for the Spark Developers?
You'll be providing a demo for Spark?
>> Yes. Yeah, absolutely. This is
a very exciting thing
for us because now,
we can take advantage of Spark,
a very popular analytics framework,
and bring it into the database.
We're the first managed
database to offer you Spark
built into the database,
which gives you rich
analytics right there,
and can take advantage of
things like native index and
other performance optimizations
within the database.
You can create a Cosmos account
with Spark Enable, just like before,
by going to Azure portal
and creating your account,
or using an ARM template.
Give it a name.
You can select any of the data APIs,
and now, you can enable Apache Spark.
Now, you can run your Spark jobs
directly as a database.
With Spark, we're also offering
you built-in support
for Jupyter Notebooks.
These are Jupyter Notebooks,
so you can take advantage of the rich
ecosystem of Jupyter Notebooks,
but they run inside a database,
and they're integrated
in our Cosmos Explorer.
Right here, I can
create new notebook,
and I can use Jupyter Notebooks
with any of the Cosmos data APIs.
For example, right here,
I can run SQL queries.
I can write Cosmos SQL queries
directly in the notebook.
I can execute them.
I can get my earthquakes.
Here, I have a little
more interesting notebook
that uses geospatial queries to
fetch earthquakes within
the vicinity of a location,
in this case, Seattle.
So let's take a look how
safe we are here in Seattle.
There's actually quite a few
earthquakes happening all the time,
in case you didn't know about it.
Thanks to our rich ecosystem,
I can do things like
visualizing query results.
Here, visualizing them on the map.
Hopefully, this impresses on
you the importance of using
globally distributed databases to
stay safe from earthquakes
and other disasters.
At the same time, we offer you
taking your existing Spark jobs,
and many of them are
done as notebooks,
so taking Jupyter Notebooks with
Spark and running them
as is in Cosmos DB.
Here, we have one of the Spark jobs
that are used by one
of our customers,
and it does a fairly complex join
across multiple Cosmos containers.
It uses Spark SQL right here,
and it does a join across
the sales data and
dimensions spread across
multiple Cosmos containers.
Spark SQL are inside Cosmos DB now,
so you can take advantage
of the native index.
That's much faster than,
and no data movement is required,
so you don't have to do
a lot of request pan out to
a bunch of different Cosmos
partitions from somewhere else.
It's all inside the database.
You can take advantage
of other things in
Spark ecosystem, like
machine learning.
Here, I have a notebook using
PySpark.ml that does
prediction based on this data.
You can use GraphX.
You can then use any of
the modules in Spark,
and they could [inaudible] advantage
of them and run inside database.
It's great because it gives you
analytics on your operational
data in real time.
You don't have need to move data,
no need for to do ETL,
and you can take advantage of
things like global distribution.
You can run your Apache Spark
anywhere in the world,
collocate it with the database,
giving you all those low latencies.
You can run these parkways on
workloads that come at millions,
or tens of millions
requests per second,
and still get the same low latency
when it comes to queries.
Milliseconds for your reads,
[inaudible] 10 milliseconds for
your writes, and high availability.
So it's altogether,
we're really excited to see what
people are going to do with it.
>> Exciting. Great. Your team
invested heavily in SDKs,
and I heard you're
announcing GA for the
new v3.net and Java SDKs this month.
What are the main benefits?
Then also, what happens to
those applications to use
a current Cosmos SDKs?
>> Later this month,
we are indeed going to
announce GA for our .Net,
and our Java V3 SDKs,
which comes with certain
programming model changes.
It brings you performance up
to 30 percent performance.
We're actually using
SDKs in our service,
and we were able to save
significant parks, thanks to that.
It offers you steaming API.
It offers you media
multi-programming model.
It's using Java SDK,
uses Reactive Framework for
a sync across this.Net SDK.
It's now done properly,
taking full advantage of all the
best practices in.Net today.
It has.Net Standard 2.0.
It runs on Linux and performs,
sometimes even better,
on Linux and on Windows.
Yes, the programming
model changes are there.
Of course, we are going to support
the existing SDKs for a while,
and we're going to help
customers migrate.
We are not going to turn off until
customers comfortably migrate.
We're going to offer potentially
some of the tooling,
some of the rapports to
make it easier to take
advantage of the new SDKs
and new performance gains.
>> Awesome. Global distribution
across our region sounds expensive.
How much does Cosmos DB cost?
>> That has always been
our struggle because we
cover this large spectrum.
You can start free.
You can run with lots of
containers, lots of data,
and up to hundreds
requests per seconds,
all under $24 a month,
which is not high.
>> Not bad.
>> At the same time, you can go nuts.
You can go across 20 regions.
You can go run 10 millions requests
per second workloads, and of course,
those cost a little bit more,
but it's a fairly flexible system.
The important thing is to
follow best practices,
and we optimized our defaults now
so that it always
starts with low cost,
and you can build and
increase as you go,
increase the scale or
number of regions.
We're doing a lot of advice that
it'll help people save costs.
The instant scaling,
instant elasticity,
that you can go up and down
ten acts in a matter of seconds.
Always, you need to take
advantage of that to save
costs because you don't get
this with relational databases.
You do not get this
with the other systems.
All these things are there
for customers to save cost,
and you can start very low.
We have 30 days, very beefy trial
with unlimited renewals.
Up to 20,000 are users,
I think, fairly at Try Cosmos DB.
So go right here.
You can try free,
and you get your Cosmos DB for free,
with lots of capacity for 30 days,
and with unlimited renewals.
>> With all these new announcements
happening with Cosmos DB,
what's the most exciting for you?
>> I think the most
exciting, of course,
is Spark and Notebooks
because it just opens up
such a rich spectrum of
opportunities for developers today.
Now, you don't have to choose
operational database versus
analytical database.
It's all one and the same.
You can run your
analytical workloads,
and Spark has proven to be such
a flexible, scalable
analytics framework.
We're happy to take
advantage of that.
>> Awesome. Are there
any more announcements for
the developers listening in?
>> Lots of small things.
Take a look at our blogs.
We're detailed there.
We're doing better tooling.
We have optimized.
Something may go unnoticed,
but we reduced the cost of
many queries 100x. That's right.
It's 100x cost reduction
for many of the queries.
Also, performance of
the aggregate queries has improved
dramatically with
the Build deployments.
Lots of goodness there.
Take a look at our blogs.
>> Thank you. Thank you, Kirill,
for joining us, and say,
"We'll see them after the break."
>> Thank you, guys.
>> Thank you.
[MUSIC]
