>> Hey friends. I'm Scott Hanselman,
and it's another episode of Azure Friday.
I'm here with Jack Mondal,
and we're talking about Gremlin for
Cosmos DB and the Graphic
API which is very exciting. What is your Gremlin?
>> So, Gremlin is
an open source query language
to query graph structured data.
As you know, Cosmos DB supports variety of
different APIs for example, MongoDB Cassandra.
Similar to that, we also have Gremlin.
So, now we can treat Cosmos DB as a graph database,
and shoot any kind of graph query I think.
>> So if I meet a person out there
on the street who knows Gremlin,
they may have used it on different databases?
>> Yeah.
>> They're going to feel very comfortable.
>> Yeah.
>> So they don't have to learn Cosmos?
>> No. So, if you deploy Cosmos DB as a graph database,
we will give you
a web socket endpoint
that understand the Gremlin query language.
So, not only Gremlin query language,
if there are any tools that are out
there that knows how to talk to a Gremlin endpoint,
that can also work seamlessly.
So, any open source tool that's based on Gremlin,
can now talk to a Cosmos DB.
>> That's cool. So, this person
that I find who knows Gremlin,
they might have a toolkit that they've built
up of different things that talk to Gremlin?
>> Yeah.
>> That's cool.
>> Yeah.
>> Is this hard to set up?
>> It's just a click of a button.
So, you just go to Azure portal and just say,
create a Cosmos DB account with Gremlin API. That's it.
>> Cool. Do you have some slides?
Did you have a demo? What do you want to show?
>> I have some slides,
and also I'll interlink with some demos.
>> Let's do it.
>> So, as you already mentioned,
today we are here to talk about Gremlin,
the query language for Cosmos DB's Graph API.
>>Of course.
>> Today, I'll basically cover
about the anatomy of a Gremlin query.
Like how does a gremlin query look,
and why is it different from
SQL or other traditional declarative query language.
>> Okay.
>> And then I'll talk about,
what are different categories of
operations that Gremlin support.
So that if you are trying
to write a graph exploration task,
you know which type of
steps or operation to pick from, which categories.
>> Okay.
>> Right? So, here is a quick example.
Gremlin query looks like what we have listed down here.
So it start with G, which refers to your graph.
And then, you list down a set of operations that you want
to perform on the graph.
The first key difference
that I want to point out here is,
it's very procedural in nature unlike
declarative query language like
SQL and its other derivative.
So what it means is that,
when you're writing a Gremlin query,
you are not only specifying the intent of the query,
but you're also laying
out how the queries would be executed.
>> Okay.
So that means that the order of the steps matters?
>> Yeah.
>> And that the graph is changing
as it moves through a pipeline?
>> Yeah. I'll come to that actually.
So, as you have said,
Gremlin follows a data flow-style execution.
So, the way to think of it is,
as you are chaining these different steps,
data flows from one step to another.
So, I think the way
to think about a step is it takes a set of inputs,
generate a set of output and
pass that as input to the next step.
And enter execution happens in a pipeline fashion.
So that means, as long as
one step can produce one output record,
it can then pass on to the next step and
the execution can continue in a non-blocking fashion.
Of course, there would be steps where you
want to do certain operation in a blocking fashion.
>> Sure.
>> Now that we understand like
the basic anatomy of a Gremlin query,
let's look at some of the different types of steps,
or different operations that you want to perform.
So, first we have maps steps.
Map steps like as the name suggests,
the operator maps and input object of type A,
and convert them output object of type B.
So, for example, as
you are working the graph you are on a vertex,
now you want to go to an edge,
or you want to go to a property.
So, you're basically converting- you're taking
a set of vertexes in an input and converting
them a set of different object.
>> Okay.
>> So, here I can take a quick example.
So, I have a very simple toy graph
of Satya and some of his direct reports.
>> Okay. Good thoughts.
>> Yeah. So, if you look at this step here,
so it basically says, G dot V Satya.
So give me the vertex o Satya.
And then this step, out E,
it's basically taking Satya's vertex
as an input and then converting
it a set of outgoing edges.
So you'll get a set of edges.
So here the are the people that reports to him,
and also some of the things that he's working on.
>> Like this book.
>> Yeah.
Next, we're going to talk about filter steps.
The way to think about filters steps is,
you are scoping down the amount of
data a Gremlin query is touching during execution.
Right. So, filter steps are useful
when you want to make your query more
efficient and make it low latency.
So, I'll take an example here.
>> So it sounds
like the sooner that I can filter something out,
the faster, the better the whole thing is going to go.
>> Yeah. So, the way to think about it is,
you can write two different Gremlin query
that outputs the exact same results.
>> Yeah.
>> But if you can apply
some filter upfront in to the query,
then your execution would be much faster.
>> So you wouldn't want to like, for example,
sort the entire database,
the entire graph rather,
and then filter everything,
minimize it and then sort.
>> Yeah.
>> Okay.
>> So I can take some example.
>> So for filtering.
>> Yeah. Sorry.
>> No worries.
>> Yeah.
>> So which one is a good filter?
>> Yeah.
So, let's take the example here.
So, in this example,
what you're trying to find out is,
who are the people that reports to Satya.
>> Right.
>> Right. And as you see upfront,
we are specifying Gerald V. So,
this means this query is inefficient,
and it's taking the entire vertex set as an input.
>> So you say, take from the graph G,
take all the vertexes,
and then look at all the reports,
and walk backwards up to find the vertex.
>> Yeah. And so, even though,
if you run this query,
it generates output Army and Scott.
>> Sure.
>> One way to apply filter here would be to,
you just say if you know for the fact that
only EVPs report to Satya,
you can specify that as a filter,
and you can say "Title".
>> And now I've just cut this down
from many many thousands to a dozen.
>> Yeah. So, this query
will output the exact same results,
but inherently, it's running much faster.
And in terms of Cosmos DB,
it's not only running with low latency,
it is also costing you lesser IUs.
So, when you are writing this query,
it's recommended you look up
all the different filtering steps that you can apply,
and make your query more efficient.
>> That's a really good point that you just made.
I want to make sure we double click on.
Because even though Cosmos DB is so fast,
it can hide complexity from you.
>> Yeah.
>> You just did a query, and to us it
felt like they both cost equally.
>> Yeah.
>> Because Cosmos DB is so fast.
>> Yeah.
>> But the second one costs
less money because less work was done.
>> Yeah.
>> That's a really great point. Okay.
>> Now, I want to talk about side-effect steps.
So, the way to think about side-effects steps are,
you are working on this graph,
and as you are working you are collecting information as
a by-product of the work without
affecting the nature of the work. Right?
>> Okay. That makes sense.
>> Yeah.
>> Observing it didn't change it.
>> Yes.
>> Now, there are operations
that will change it if I observe it?
It will cause side-effects?
>> I'll talk about it little bit later.
>> All right.
>> So, let's take this query.
Let me write down.
>> What's our goal here?
>> So, let's say I execute this query.
>> Okay.
>> So, here I found out that Scott and Amy reports Satya.
>> Right.
>> And let's say, I'll
add another level of indirection here.
Now, out of Scott and Amy,
I found out that Jason now reports to Scott.
So, the goal of this query is,
I want to find out in this digraph,
what are the vertexes,
or what are the reports that
are two step away from Satya.
>> Okay.
>> Right? Now, in this case,
it's only outputting one vertex.
Right? But now let's say you want to find out,
as you're working the graph,
you want to find out all the reports that you have found.
So, one way to do it is,
break this query into two,
where you say Gerald V out one query,
and then two outs, another query,
get the results and join them.
Either way would be to use a side-effect.
So, what you can do here is,
there is a side-effect step called "Store".
So you can store
the output of the first part of
the traversal into a variable X.
You can store it into the same variable.
The output of the second step.
>> Okay.
>> And then you can call as step called "Cap",
to output all the results.
>> Does it make a union of them?
>> Yeah. It makes a union of them.
So, if you look at the Jason output,
all the three vertexes;
Jason, Amy, and Scott will be there.
>> I see. But because it's a union,
they are unique, and I didn't
get two Scotts or two Jasons?
>> Yeah.
>> Because there aren't two in the graph.
>> Yeah.
>> Very cool.
>> So the other way to
write this query as I have mentioned,
would be to split them into two query.
That will not only cause more latency,
but it will cost you more.
In this case with side-effect,
you can just work the graph once,
and collect as much information as possible.
>> That's cool.
>> What do we have next in the remaining minute?
>> So, as I was talking about this side-effect step,
I also talked about, barriers step.
Which is basically, as you are executing,
you says stop here, give me
all the results that you have calculated so far.
So, this was the cap step here.
>> These are well named. I like these names.
>> Yeah.
>> It is a barrier here within the pipeline.
>> Yes.
>> So, when you said "Cap",
it's basically telling gather
all the output and give it to me.
>> And people who understand Gremlin will be
appreciating this experience because
this is exactly what they're doing all day long,
except now they get a big cloud scale,
generally looking at a database to do it with.
>> Yeah.
>> So is this available?
People can use Gremlin and Cosmos.
>> Yeah.
>> No problem.
>> No.
>> It's one click when you make your Cosmos take place.
>> Yeah.
>> Is there anywhere
that people can learn more about this,
or Docs specific to Gremlin and Cosmos?
>> So, there are Docs on our website,
where people can learn about the steps.
So, we document all the steps
that we support but you can always go to
[inaudible] website where these steps
are clearly documented and there are examples.
>> And then also on Cosmos DB there's a section
called the graph introduction
that talks these things as well.
>> Yeah.
>> Thanks so much for sharing with us today.
>> Okay.
>> All right. I am learning all about working with graphs
and Gremlin on Cosmos DB, here on Azure Friday.
