>>> Glad everyone could make it to the sheraton. I did not know that we have
Sessions here. Crossing the street wasn't too
Bad so thanks everyone. Thanks to everyone were for
Being here.  This is cosmos db's database session. We will learn a few things about
Graph database and we have the honor to have three guests from
Some of our customer cases. They will talk about their
Specific implementation on how they use graph and how it brings
Value to their applications in their businesses.
So, yeah, lee bosquez.
I over looked the graph database part of it.
You can reach me by e-mail or twitter. If you're familiar with cosmos
Db the team loves twitter. We have our hierarchy based on
How many followers everyone has and that defines our rewards so please follow me.
I need this. Just kidding.
>>  [Laughter] >>  so, yeah, so we'll get
Started because we have a lot of content. We'll have a few minutes at the
End for any q and a if there's any questions on either the
General topic of cosmos db api, which is the same as graph api
Or any of the specific implementations that we'll hear about from some of our customers
Here presenting. My name is lewis.
Originally from mention. I've been working at microsoft
For 6 years, three of them as an intern and three of them as a
Full time employee. The previous experience, i've
Worked in microsoft before, previous products are sql server
In linux, so putting sql server in platforms as well as
Containers, connective for azure database and now everything since 2017 is cosmos db for
Graph databases. I have some of the samples on
Graph database at my github. You can connect with me on
Linkedin with that alias. So the main topic here is azure
Cosmos db gremlin api. How many people have heard about cosmos db?
How many people know about cosmos db? like a level 100 standing?
Okay. How many people here have
Implemented cosmos db in an application other than our
Customers of course? cool. So good.
30 Of the room. And him as well.
Okay. So i guess i won't have the need
For the next slide but here it is anyway.
As you know, cosmos db has apis on top which you can use to talk
To -- talk with the database on any of the data formats in the
Middle just to get all the benefits in the bottom.
A couple announcements that we had at build where database
Shared throughput so you can specify a level throughput that
Will be shared across all the collections in a specific database.
Bulk inport utilities and much more. -- Import utilities and much more.
So for those of you who have used cosmos db you're probably
Familiar with an api in cosmos db. An api is just an implementation
Of the wire transfer protocol in each one of the database labors.
So if you create an account it will have a specific flavor that
Can be mongul db, a canada or sql. It used to be known as document db.
There's some confusion all the time. We always get questions in the
Booth like is dock db the same thing. It's now cosmos db.
It's just one of the subsets of the services that it offers.
Assuming that all the functionality that you need in
Your open source database model as well as all the -- as well as
Your client being supported on our side, theoretically you can
Have your database, your application and your driver for
Your application just change the connection stream to cosmos db
And it would work against a previously existing mongul db or
Gremlin database. So we rely on open source
Connectivity to make it easy to just use whatever platform of
Your choice, like programming platform or frame works of your
Choose against cosmos db and enjoy the benefits like
Throughput and sals and all the distribution parts as well.
This isn't a db manage service that handle global distribution,
Elastic scale out or et cetera. Is the sound okay?
Okay. So in a sense you can perceive
Every one of the apis as a separate service of its own with
All of the core benefits that cosmos db over all provides.
Let's talk about graph database. So graph database is not really
A new concept. It's been around for a while.
Some  of the biggest proponents of graph database include db, et cetera.
There's a lot of flavors around it. This article i like to share a lot.
It's just -- it's kind of old but back in 2014 it says that
Folks predicted that 25 of prices will be used in graph
Databases by 2017. We have seen that in graph user
Scenarios that involve just things around predictions,
Recommendations, just like exploration of different types
Of meta data on top of data as well as the graph database being
The main solution on its own. So i recommend reading this.
This is just for some inspiration on what these cases
Achieved with the graph database. That begs the question, well,
What is a graph database. How many people here have used
Graph databases in the past? okay. Good.
So a good 30. Some of you know a graph
Database is just a -- it's within the category of no sql or nonrelational databases.
It just means that it's similar. One of the properties that
Defines it as a database is once you insert the data you don't
Define a constraint on the data as you inserted.
You find it as you read the data. Once you read the data you'll
Define what kind of properties you want specifically and that's
Going to infer what it means -- what is the structure of the data itself.
This is applicable to many other data sets, databases including
Document database like sql api, et cetera.
The one specific thing about graph database is it's optimized
For processing highly connected data and just calculating
Relationships and operating on that end. So we talked about how there are
Many graph database standards that are defined to many of the
On premise services. Cosmos db is based -- cosmos db
Api is based on the graph standard. It is an open source standard
With a thinker and  to the community that comes with it and
It's between the two flavors of graph databases this is known as
A property graph. It has several
Objects, edges represent relationships between
Objects and all can have an arbitrary list of key value
Properties to enrich the data set. We'll walk through a fine
Example on how this plays out and the fact that it's part of
An open source standard means that there's a system of
Solutions, frame works, samples, just all the goods that come
With being an open source project. So as i mentioned graph
Databases have been around ever since the early
2000S. With oracle and space graphs,
The apache project came up in 2009 and it's been developed up
To division 13.3.X. The team in cosmos db including
Myself and some of the engineers are part of the contributors for
The apache thinker project. We have discussions and
Sometimes arguments about how the project should move forward,
On the -- to stay aligned for providing enough functionality
For people to use it against cosmos db as well as for it to
Be a standard in the future of graph databases themselves.
We added support and it was a preview ever since i think
August of 2017. In january 2018 we actually
Released the general lead version of it.
So today cosmos db, gremlin api is the only graph database as a
Service, as a fully cloud managed service that is in
General availability. There are other cloud graph
Database performances that are not released yet but cosmos db
Is still the only one that is out there.
The popularity of cosmos db on its own came up -- made the
Graph database option go up to the second place just on 4j.
So this ranking is actually based on how popular the
Database is in terms of like stock over flow questions, in
Terms of github contributions, in terms of job postings and
Other things like that that define this database engine.
The one conclusion you get out of it is if this database engine
Is going to continue existing based on its demand and the
Community that it's built around.  As i mentioned, all the
Cosmos db features are available through this.
It's fully managed, horizontal scapable
-- Scaleable and has the major
The ones listed below. We'll talk about what makes a
Graph database and how you can use it to model objects.
I like to say that this example is the only thing that you need
To know -- to think about graph databases to start using them in an actual solution.
After that we'll talk about some of the implementation patterns
And some of the solutions you can build around that.
So what is thinker graph? there's two objects,
Vertix and edges. The best practice is using the
Table as the type of object that you are defining.
So, for example, if you have a graph object name luis it is a
Type of person and that's something you denote inside the
Label type of property. Then it can a have a list of multiple properties as well as
The edges. I got some questions about what
Are the best practices for edges.
Edges you have to name them around the type of object that
They semantically -- the type of
Object they are targeting so it's scaleable in terms of
Syntax and technologically.
We'll walk through an example on how to use a graph database.
Assume the following graph. We have kobe bryant as one of
The objects.
We'll add a few objects. We have a labeled person and
Properties, age and height. Then we can add another object
That the notes that kobe bryant is part of the l.A. Lakers some
We have a team and that denotes that it's part of a team on that end.
Now to increase the data on this example we have the fact that
Kobe bryant has won five nba championships as well as in 2018
Something pretty historical happened.
Kobe bryant was the first to win an oscar.
So it will point to the 2018 oscar he won for best animated
Short film.  This is a pretty well connected graph until you
Insert a vertix that's tom cruise.
He's not won an oscar or an nba championship. Far away from it.
So how do we make this a connected graph? just for the sake of the
Modelling they both connected because they are hollywood
Celebrities regardless of how many awards they have won.
So this is an example of how we can use graph database to model
Real situations regardless of how -- regardless of what you
Have on the data itself. Now what about the scale?
Partitioning is handled automatically in cosmos db so
The only detail that you have to take care of is adding a
Partitioning key if you create an unlimited collection.
There's two types, a fixed that has a limit of 10 gigabytes or
You can have an unlimited collection which does not have
Any size or throughput restrictions and it can go up to
Second. It can be an infinite data set.
Every partition will now have those enforced linear qualities
Of being 10 gigabytes in size and having 10,000 requests units each.
All you need to define -- this is something that's complicated
To handle in the back end but it's down automatically.
All you need to define is a property of partitioning key
There may be cases where you find a graph that's connected
Across two por -- partitions.
This is a cross partition query. It has additional latency
Because they have to be scanned such that the connection can be
Found between these partitions. What we do to optimize that is
We automatically locate the edges in the same partition so
You only have the care about the partition key themselves.
So when should i use a graph database?
Well, if you're coming from the relational world a graph
Database usually solves some of the things that are found just
By the fact that relationships are denoted as just an
Additional data element as opposed to a constraint or a
Connection between a bunch of tables.
If you have hierarchically
Structured data on the same table, on the same rows yo --
You may benefit from a database. If you have groups or groups and
Then end numbers of hierarchies on top of that maybe a graph
Could be a better solution or really complicated a query.
If your query has more joins --
Like say 80 are joins then probably your queries are forms
More relationships than exporting data in the table itself.
And just flexibility in the data model. Just like we inserted a tom
Cruise means any kind of business
Pivoting could be supported. That was a joke.
So there are no constraints on adding new data.
So you collect it first. So is your data naturally a graph?
We'll talk about a very brief example. So on the left side of the
Screen we have the relational implementation for this solution
And on the right side of the screen there's the graph
Representation of it. So we have an hr data set.
Things work out okay. There's two people on the sales
Group, one person on the engineering group, my manager,
Rima and what happens if, for example, in a company kind of
Like microsoft -- microsoft does this all the time.
What if you say, okay, what if an employee can belong to
Multiple groups. So there's a sales and engineers group.
Well to implement it in a relational world you would have
To create a key of now a new table that that the
Relationships of the groups and the employees and then map the
Fact that an additional employee can belong to different kinds of groups.
So in this case in the graph database world you just add
Another an additional relationship saying an engineers
Group now also contains relationships are just documents.
In this case we have to have support for azure on top of the
Sales group and engineering group. So for this -- you know, i only
Worked in sql server for 2 years so you might have a better
Solution for this in the relational world.
What i did is i extended the employee and group relationship
Table and i added a group or groups table which is on the
Bottom which is done at a total addition of one table, six rows
And two gnu columns -- new columns where in this case we
Only have the ver tix of azure.
I heard there was some contest? no. This is just what i came up with
If i was the developer like for this solution.
Let's complicate this a little more. What if we want to use this data
To also store hierarchies of the point. So who reports to whom, et cetera.
So hierarchies in this case would be adding two documents on
The graph side just denoting that the direct report
Relationship but in the other case we have to add an
Additional table for the same existing table how their
Relationships happen inside of it. I see a few nods, a new weird looks.
There might be better solutions. As a caveat, this is just what i
Came up with. So once we have this data structure let's see what it
Would take to add a new employee.
So adding a new person here would mean that the -- on the
Employee table we have to add that row and then we have to add
All the foreign key relationships for employee and
Group and all the foreign key relationships between employee and id as well as the super
Group that's containing all of the groups where as we can add
The vertix and the new edges. Let's complicate this a little more.
What if there's a super, super group, microsoft.
Microsoft owns azure and azure contains the two groups.
On the left side is changes that you have to make to your data
Model on the relational world where as here it only involves
Adding two more documents. So this is more graph base.
When you have end degrees of hierarchies that contain
Existing objects that's where graph database is really -- it
Really performs -- it really does better in data modelling.
So here i just wanted to map out another situation where as a
Person that has a hispanic name we also have our second last
Name as part of our names and sometimes government requires --
Government-like forms require that, sometimes they don't.
In this case we would have to add an additional column for
Second last names and keep it null for no second last name
Where as here i can just add an additional property on the luis product.
Just like when the data of the elements themselves have to change.
Let's look at what it would be to query this.
This was a bit of a challenge for both situations.
So the query was get all the managers under the same engineers group.
On the left side the query here means that get all the vertsees
Engineers. So we'll get all the engineers
-- Engineering group and then under engineering go out and get
All the members of the engineering group. This data set will get everyone
Under the engineering group which is andrew, the new person.
Didn't bother to name it. Go in on the relationship of the
Direct report and get the name. So that will only get rima as the manager of everyone under
The engineering group. Now on the right side -- again,
This is the solution that i came up with.
It's basically i had to use a sub query inside of it to get to
The same employees that have reports so you get the id on the
Left side of the column that includes their relationships
Between the employees and the direct reports and i do an inner
Join on the group just to filter engineering.
So just a demonstration of how different solutions can be
Differently -- you know, they have different levels of
Complexity at the time of queries data.
So graphs are good for hierarchies. So a few implementation patterns
Are the following. What can you build with a graph
Database? so amongst the solutions that
Have been implemented on graph database, we have social
Networks, monitoring as some of the, say, main solutions that
Are out there. This shows you all the things.
We'll walk through a few examples on what this means.
If you have a graph that defines a couple of people and how they
Belong to a group, column pattern in property graphs is a
Recommendation engine and the recommendation engine is used by closure.
If andrew and luis both know rima and belong to the microsoft
Player team the chances are that luis knows andrew.
If luis belongs to the microsoft micro players and purchased the
Db package and they know andrew and they have the same
Connections within a graph then chances are that andrew may also
Would purchase that. This is only done through,
Again, just gremlin queries. Just queries that explore their relationships between each of
These objects. In the case of a monitoring
System, some of you have different servers and different
Databases and the hub that serves a traffic or lowball
Lance or whatever -- that's not relevant.
Assume one of the database dies. So a monitoring system that
Could be built around the graph database would be a system that
Looks like this. So we have different denoting
The databases and then different edges connecting databases to
The servers that they have to -- that are being queried by these
-- That query this database and just different
Connections to whatever load balance is on top
Of that. So in the mega data of it we can
Have a property that says if a database is active or not and
Then have a chain reaction on it saying that if that database is
On the main source data has been corrupted or something that
Makes it not be active then you probably want to save on cloud
Resources and put that server to sleep or if you have the end
Template you can kill that server and not pay for the time
It takes for how long it's on. That's just another of the
Solutions that can be built. This -- we have seen examples of
This being implemented through just a
Monitoring system that you can chain feed in cosmos db
And that with a system that's actually monitoring the
Resources and storing all the meta data in cosmos db.
The last example i wanted to talk about, theoretical is a knowledge graph.
A knowledge graph is basically a combination of different data
Fits and maps the relationships together to answer critical
Business questions.  So it's not meant to be a processing system
But more of a lightweight representation on top of it and
There are systems that actually run aggregations on existing
Transactional processing systems that then extract the meta data
And insert into a graph database. This can help to answer
Questions like out of the -- rainy which of them claim
Fishing trips to be their main food?
Disclaimer all of them. All of them think their fish and
Chips are good. The other question, how many
Coastal cities are rainy, how many cities claim to have good
Fish and chips, et cetera? there's a lot of -- you know,
It's all about the data that you can build on top of it by
Extracting meta data from systems.
For for this i would like to invite -- as i mentioned we have
A couple of our real use cases, real use scenarios built on
Graph database and i would like to invite dave who comes from
Data dynamics and he will tell us about -- he will tell us
About their use case today. I think it's on.
>>  Thank you. >>  Yeah.
>>  So thank you for having me over just to get this kicked off.
>>  Any time. >>  It's an
Front of you-all. Luis just mentioned a couple of
Things, first of all the reference to twitter.  There's
An interesting story here. So we were having a little bit
Of an issue with some inserts in cosmos, a long, long time ago
And we actually connected with a champ called denny who may be in
This room somewhere who wrote us a very specific application that
Allowed us to bulk import our data in 20 minutes.  This is
Terabites of data and it was taking two days before.
So twitter really does work with the cosmos team.
Really does work. So moving on a little bit,
Swiftly here, we are the largest company in the automotive data
Space you've probably never heard of. We are the only global supplier
Of competitive analysis information or automotive
Intelligence to the manufacturer industry.
There are other -- we have competition obviously within countries with markets but we
Are the only business that does this in a context.
So that's specifications around the equipment that you can have
In vehicles, you know, all sorts of stuff.
We exist in -- i just realized this slide is quite out of date.
We are in 58 countries. We have over 600 staff.
What we really do, our core base value is we allow our customers
To make really smart decisions on their products proposition
And pricing. We do that today, i'll go into
The story in more detail, we do that today in a -- i'm going to
Be generous and call it a megalithic context.
We have tons of information around people researching information.
We have hundreds and hundreds of sql database shuffling
Information about ultimately generating the value that we
Provide to our customers. So in the context of where we
Are today, and the context of where we are moving to, i think
As luis mentioned earlier, we have
Created a knowledge graph.
So you look at this slide and it seems to make common sense.
Bmw m5 has a manual
Transmission. You can't have an automatic transmission.
You can't have both. In a sql connection with we have
Tons of rules, processing data everywhere, a very, very manuel
Activity in a knowledge graph we simply add another node or a
Vertex, we use a vertex and add an edge which is
Kind of how we establish a relationship between
The two nodes that allows us to understand the difference,
Differentiate between the two. This allows us to do one very,
Very important thing for our business.
We can correctly generate 100
Vehicle in i think it's about three or four milliseconds which
Is a present day that we've optimized over 30 years and it
Still takes about 2.5, 3 Seconds runs in c plus plus.
So just by changing our paradigm from a transactional
System we have a significant benefit for
Us and our customers. So it's a huge
Benefit to us. The other thing i should talk
About in this slide is the advantage to being able to
Simply add vertexes and edges is we are
Not constrained by a sql data structure.
Those in the room you know what sql is so i'll just make some
Assumptions here. There's two ways to work with sql.
You can have a denormal structure which is very, very
Quick to query but very, very hard but it
Can be solving that generally means you have to
Change your database screamer to give you the flexibility.
That might be 12 months, 2 years, 3 years, whatever it may be.
The one thing you can guarantee in a traditional iibms is the
Data structure will have to change again and again and again
Constantly. Now we have -- we were suffering
Quite badly with this problem. We had normalized where we
Shouldn't, denormalized where we shouldn't. Moving into this structure has
Given us some significant performance benefits, i mean
Significant but most importantly it's enabled us to change our
Tax and our data structure. So we have -- the way we operate
In the research communities, we research i think it's about
3,500 Data points every vehicle on the market going back 34, 35 years.
All of that information is coded into what we call schemer.
So it's the j screamer. Nothing to do with the database
Structure, just what we call it. The graph model means that we
Can change our schemer at will
With no impart. We add a new schemer, start to research.
The application can answer a different question and the power
That comes out of that, the info it can gain just by being able
To make a simple change, unbelievable.
Generally i would -- if you're going down this road i would
Definitely recommend that you invest as much time in
-- Before you start in the implementation perspective.
That's where you start to understand maybe not the depth
Of your data but the breath of it.  Graph database is
Particularly a knowledge graph and all about meta data, not big
Data. Thank you. >>  Okay.
So i talked about some of this slide already.
The take away from here is that simply adding a new vertex or a
New edge means that you can generate complete new insights
From existing information. Completely new. The way that we think about this
Is something like really --
Again, going with the bmw example, bmw has a professional
Media pack and audi has a media pack and a business pack.
Some of the things we have to do in terms of research objective
Is understand the contents of all of those things.
So that's -- that's how we provide the competitive insight
Back into our customer base, you know, the manufactures.
Maintaining the links between those pieces of information and
Then generating the query against it, very, very complex
In a relational database world because you're tied to your
Schemer, your performance of screamer and normalization or denormalization.
In this context you just add a new edge. Link the two things directly.
The insight that you can gain from moving beyond just having
Structured rows of information and actually having linkage
Between different data sets, one of the things that j prides
Itself on is we are able to go back 30 years and tell you from
The very, very first ford fiesta -- no, it's a ford escort way
Back in the day. For every single version of car
That ford has released up to now we have linkage.
We have linkage in a graph context now because it's a
Knowledge graph. Maintaining that linkage of
Information in a sql database, some of those queries really
Nearly 4,500 lines long.
Now it's a single line query
Because of linkage. Knowledge graph context.
>>  Global distribution. >>  Excellent. Thank you.
So one of the real problems with being a large -- well, small but
Large global business is that you really want your data where
Your customers are. Now all of our infrastructure
Today is hosted in west london. Oxbridge. Don't visit.
You're not missing anything. Now we have the capability to
Host our data in a single click replicated context in any region
In the world and i don't have to do anything.
I tick and done. Tick and done.
Untick, save some munch it's brilliant. The level of capabilities and
The flexibility, the resiliency that you're getting from this
When you move away from being tied to physical pieces of
Infrastructure in data centers or in a sql context you have to
Zones. None of that exists in the
Cosmos context and it's very, very quick.
They say low latency, low latency is not doing it justice.
The region rights is phenomenal. This is on terabytes of data.
We are not at the parabytes yet but we are working on it.
So here's my -- this is my piece of insight that i want to share
With you guys more than anything. The journey to moving from
Traditional sql on prim out of dbs connection into cosmos can
Seem very, very daunting when you start.
I think you can look at it in a number of different ways. So we've been on a journey with this.
We started with data factory and very quickly real liedz
-- Realized it wasn't going to give
Us the performance or the cost point. The price was excessive when we
Ran into significant demand. If you think of the journey from
Sql server into cosmos db the way we've structured this is we
Have a staging format which is the data link.
So we have around azure data link. Everything gets dumped in there
In its raw native format. We don't use meta information.
It's just as valuable information.
We process it. If your data is not clean and
Tidy you're results you want so don't skip that step.
We move it into a document database first. Why would you do that?
We use the document part for localization.
So everything we research is in what we call j-ter english which
Is british english with a couple of american words here and
There. It's most english.
Using the test services to localize that
For because why should we do it? i don't want to do it.
Let's let the machine take care of that.
So we built the document database specific for that
Purpose to create what we call a standard data model.
So that's all the facts for every vehicle, all the options
And all the rules that sit between all the
Options on cars. We then process the standard
Data model into the graph. We cleaned our data,
Normalized or data and then out it in the graph.
That means that we know because we measured the processes
That we have 100 per
-- 100 Accuracy. It is the right tool.
For us it was a little bit like using a sledge hammer to crack a walnut.
We didn't need that level of enterprise capability of what we
Are doing so we rewrote the whole thing.
I mean, this is really, really complex stuff.
About three hours. We wrote it into azure
Functions.  That's giving us a major benefit. We can process things on a
Trigger basis and we are only paying for the import service in
The standard data model and the graph database when there's something to process.
The cost base is definitely, definitely worth considering as
You go through this process. The data migration
Has been a labor of love for us.
Nothing to do with the
Engineering or implementation. 30 Years of interpretation of
How we should research this versus this.
We still have an issue today that some of our regions refer
To research in one particular context. So we've had to create specific
Functions to translate that and normalize it into one way.
That's the thing that's taken the time for us, not the mechanics.
So i think that's pretty much all i wanted to say.
I'll hand it back over to luis. I think we have some q and a at
The end. >>  Thank you.
>>  [Applause] >>  all right.
Thanks, dave. So our next guest here is ramsey.
He comes from the microsoft supply chain and he will share a
Few details about the use case for supply chain. >>  Thanks.
>>  Yep. >>  So we produce first party
Hardware like xbox and surface.
The nature of our data is very, very highly connected and very hierarchy.
So it turns out that the graph database was the perfect solution to be able to get some deep insights.
One of our main goals is to always provide maximum value for
Our customers and the make sure that we are also looking for
Ways to reduce cosmos db, improve quality and make sure
The level of service is really good. In order to do that you have to
Get insights from cross functional types of parts of the
Supply chain, whether it's manufacturing, sourcing,
Delivery, shipping, stores.
We need to find a way to cross reference all that data.
So we started thinking about, well, how do we get all the --
We have many, many different subpoenas underneath that go
Across these functional areas and we needed a way to say -- we
Needed it all in once place to ask questions to enable our
Business to get these insights with the help of ai and machine
Learning to say these are the opportunities, you know, this one supplier is starting to look
Risky, now go find all the different parts and all the
Different devices that were produced by this supplier.
So this is a very high level type of architecture diagram
That represents what is happening in the supply chain.
You have all these data produceers
Produceers  -- producers on the left, factories and it devices.
There's parts that are constantly failing.
That needs back to let us ask the question, okay, this one
Part that keeps on failing or testing and we have to continue to repair, can you locate all
The different suppliers that are giving us that part?
Not only that, we just kwarn --
Now i need to say find me all the
Store that is -- stores that
Have this part and find all the current devices being shipped so
En route or with customers currently. So that took us about 4.5 Days
When we had to go across the different
Isolated systems. I think that we'll be able to
Get that down to minutes now. So on the right side -- so this
Is kind of like a land of architecture. We are getting this realtime
Data stream and we are going to fork it into two places, the hot
Path operational stories, so stories is a key thing, and on
The right side it's going to be more the analytics and
Reporting.  That's very important still. It doesn't go away.
We are going to take some information from the right side
And -- like all the different predictive models and say, okay,
This is predicting that this one part is going to skyrocket in
Price or this one supplier is starting to violate an ethical
Type of rule, can you locate the difference based on what you are
Seeing on the right side verses analytics can you find the
Information on the left? sometimes you just want to show
Events that happened inside the environment.
So they just want to calculate the price of materials and be
Able to project what are the right suppliers for the parts.
So this is an example.  This is a very specific example that we
Started doing at poc for sourcing. This is a bill of materials.
We had some very interesting discoveries.
Actually, there was a couple of eureka moments we had where we
Were just really blown away. So to price a bill of materials
Right now in sql database it was taking around 3.5 Minutes.
You know, when we imported all the data, again, thanks to the
Cosmos db with their bulk insert tool, we were able to quickly
Model this. So the thing i love about azure
Is it's so plug and play and you can quickly get stuff up and
Running to prove your pocs. This was done in about two to
Three days some -- days.
So what we quickly found out is what took 3.5 Minutes is now
Take, us seconds. Here's the design and -- it's
Not just the performance but how you model your data.
What happens is there's a program at the top that pins a
Material, which is an xbox or surface or a pen and there's
Multiple parts to that device and then there's also multiple suppliers.
What we started doing was putting master data on the
Edges. That's one of the key things
This help performance. So you have this database that's
Telling the story between the two vertexes and that's critical.
To us graph is a place that you can tell stories.
What is happening. There's a relationship, okay,
Tell me more about the relationship. So there's some kind of -- i
Don't want to go too deep into this but there's a split
Information that goes out months between those two parts.
That just means that you can use this part and have another
Alternative part that you can use inside this assembly.
The same thing at the bottom, you have the supply information
And for one part we can source it from different suppliers and
Then we have the break down of the cost and the split to say,
Oh, yeah, 80 of this part is going to come from one suppliers
Verses another. So what is going to happen now
Is this is just about pricing but we are going to start adding
Risk indicators about the suppliers. We also have these audits that
Happen in our supplier areas, we
Go to their places and look at a bunch of things.
Are they being humane, using the right types of standards so that
Will also go into this information that is sitting
There on the edge. What we are going to do actually
Is we are building this capability now where when you
Hover over all of these edges you get this line chart that is going to have this information
That you can get instantly where you need to do that previously
With something like power bi. Okay. So this is kind of the high
Level of what is going to happen. Data comes in, it could be from
Event grid, realtime or it's coming from other services, we
Are pumping it into intermediate
Stage and then we are pumping it. They know the data, how it
Relates to other data and then it's put into graph database.
We say that gremlin is not that easy to use so retraining luis
For people to just ask natural questions and then extract
Entities from that and then we can run the gremlin queries for
Them. So somebody can say, hey, can
You please price a bond for this program for that material.
I have a demo, i don't know if it's going to work but we'll
Give it a shot. I think that i'll show you a
Quick demo.
I'm just putting in a fake program, fake material.
I'm not allowed to show it.
So that was literally under 2 seconds. Actually now most of the time
It's just building the tree. So this is an idea.
This is actually like a pen, i'm just putting all the parts and
Making it fake. But this is based on a real bill
Of materials. You can see that it was super
Fast to build. I'm just going to show you the
Query. That's pretty much it.
Now we have a crazy recursive sql thing that has taken us so
Much time. This is not just the hierarchy
But some of the calculations. This is just the bill of
Materials but not everything is costed. So i'm going to run a costing model now.
That will just bring me the parts that we actually pay
For. So that's it.
What happens is we are invoking the functioning app that has it
In the sdk that talks to graph and it also does the
Calculations on the fly. Now you can imagine that -- like
I told you before, i said it took 4.5 Days for them -- i'm
Sorry, 4.5 Hours for them to get all the pricing for all the month.
Now we've got, you know, for one it's like under 2 seconds.
Imagine now that we could horizontally
Scale and these and say calculate all the pricing
For us and it's going to take minutes.  This is going to
Unblock business thinking that now that technology is no longer
A constraint one of the interesting things we can start
Doing now, let's start doing risk management against our
Suppliers or other types of value added business ideas.
Okay. So now one last kind of usage
I'm going to try to do this luis thing. I'm just going to look at the invocation.
Let's see if it's going to work. Ask part price calculator for
Program gold material brown.
So that actually did work. I can't show you the names in
There but it extracted the entities using luis and said i
Ran this exact same thing on your screen but we exported to
Excel. Now we are thinking about all
These ways we can enhance our users experiences using interconnected services.
Thank you very much. >>  [Applause]
>>  All right. Thanks, ramsey. So as you have seen, we have two
Different scenarios on completely separate and
Different industries. One of them is a build materials
Kind of mapping and calculations on top of that as well as
Another scenario on intelligence gain from car data.
So right now allow me to invite on to the stage glenn that comes
From archive 360 to talk about another scenario from graph database.
So please welcome glenn luft. Hello.
Thanks for coming. Luis asked me here for two
Reasons, why did we move to cosmos db, in familiar
-- Particular the graph and
Secondly how, how are we using it. So a little bit about us.
Archive 360 manages customers
Data and brings it into azure. We manage dash board reports out
Of mainframes, applications being sunseted, live databases,
Crm and capture for compliance reasons but also for data
Mining, audience:
Office 365 e-mails. During ingestion we have options
Of transforming the data, parsing the
Data. We have an owner who wants to
See that data according to the original view.
So we have a style sheet that renders it back to that time as
To how the end user viewed it. A compliance officer has the
Option of seeing that data as it changed throughout time.
So we capture the information at each point, t0, 1, 2, 3, 4 and
Then you can view it with a time line user.
Then we have the option of analysis based on power bi
Dash board and our own reports
Against cosmos graph. This is for people not fluent in
Gremlin. Last but not least there's a
Data scientist who loved gremlin and knocked themselves out, go
For it. For us meta data is everything.
So i'll give an explanation on why. So this was us back in -- well,
I'm from the 1980s to date myself. So 1990s we were working on
Basically archiving information management on data mainframe.
Not only ing
To get into the system and e-mail.
Then sql service was kind of creeping its way in so we put
Our data in there. It was good for ridged and it
Was consistent, it was great for good performance to a limited sense.
Then you hit about three years and you start hitting
Performance issues. Why? our data is hierarchically.
There's groups and groups. There's compound documents.
The data by nature is heterogenius.
Then in 2012 we looked at mongodb, we started leverages
Data warehouses, starting looking at fantastic
Performance. You can see what was getting the
Customers, the price tag. This is just an example of one
Terabyte of meta data. That's our use case.
Yours might be different. This is us on cosmos db.
You can see the price point is much lower.
We are able to handle hierarchies, nested, groups and
Groups and it's elastic. I'm not putting all my meta data
In cosmo.  There's still things suited for sql server.
We are a fed rated meta data model. I have some data in sql that
Makes sense for a highly structured environment and
Anything that relates to hierarchies, groups, i need to
Add properties on the fly, i need to take to take away -- any
Time i see nulls i'm like that probably didn't belong in sql.
That probably should go in one of the other two.
These are our customers today. What are they asking for?
They want not just to query and manage information per data
Type, let's say originally it was an e-mail, they want to
Manage data across everything in our system.
That's across petabytes of information.
They want to analyze, they want to reiter
Tate. They want a detection.
In the eu they wanted gdpr. They are concerned about
Constantly looking for privacy information, flagging that and
Immediately redacting it, either pseudo or nono.
Cosmo is helping me there because it's helping to give me
Clues as to where violations might show up. This is how we use it.
We are basically doing user information relationships, information to information
Relationships, i have audit. Every single action that an end
User does in our system is audit captured an maintained for the
Life span of the system. If that data is going to live
For 30 years that audit capture will live for 30 years.
Privacy and security, the privacy and security and the
Example you'll see in the second example in particular,
This is across billions of item sos to
Give you an example a customer has 100,000 users on average for
One data type we see about three billion items out of them.
That's just year one. We managed multiple data types
And they are all hierarchy and they are all groups and groups.
The first example at the top, basically they want to do
Relationship mapping across desperate data and they want one
View to get to it and mine it. So that's it on the slide.
I'll tempt a demo. That's embarrassing.
>>  Scott guthrie did it at the
Keynote. >>  I just can't.
No. So this is our front end web application.  This is a good
Example of the data. So i'm going to do a search.
I'm going to look for -- we used -- because i don't want to show
Any customer data this is the only sql database.
Yeah, it still lives. Never dies.
Here is that relationship view that i mentioned. Okay.
This is a user view, the end user has a special web ui and we
Grab that style sheet and we match it to this particular
Data. Okay. So nice and simple.
The end user also has the option of leveraging power bi, so a
Tabular flattening of the graph database and you can view, of
Course, pretty sharks and so on. Great.
That's that. We built for our own use but
Also our customers basically is everyone here familiar with the
Graphs floor a little bit?
It's highly extensable. The finance department that's
Useful to have because they have money and i find that useful so
I'm going to walk through an accounts payable three-way match.
It doesn't get anymore exciting. It's basically what this is.
There's a customer complaint
That comes to the finance department and they want to see
All the data and meta data relating to that complaint in
One view.  This is an example. We have that other view where
You saw where you could search for it but i thought i would
Show it in graph. It starts out with an end user
There. Their name.
They want to give us a purchase order. It comes from ssarp and what this is going to do is it's
Going to basically do a link back to the system and it's
Going to bring up a view. So the purchase order was
Brought into our system and the credit card information was
Automatically anonmiezed. What it means is it was
Privately encrypted and thrown away. If it wasn't we would give
Someone the key.  Just to be clear on the difference there. So we have a purchase order.
That's step one. The purchase order looked fine.
Now the sales order. This gets into the relationship
Mapping. What this is doing is it's a
Live look up across the graph in the edges and it's building on
The fly this form. So this was a look up against --
Let's see the top sales order that was a vertex and then we
Have the customer and each of the sales order detail that you
Saw in the graphic explorer.
Okay. There's also associated out
Sales force the opportunity itself.
Again, rendered on the fly and there is the
Opportunity and then last but not least you have
Some details about the account, again, from sales force.
So, again, we've -- from input sources we've taken it from an
Erp system, from a crm system, from the original adventure
Works database and so on. In this example, although it's
Not rendering right on the screen, there's a time line view
And this shows each of the changes across time as the end
User inputted it on the sales force account.
And finally you have the e-mail of
Complaint. They wrote it in spanish and we
Leverage azure and we translate it.
And that's my demo of how we leverage graph across billions
Of items. >>  [Applause]
>>  Thank you. Thanks, glenn. Okay.
So those are our demos and our guests for today.
We will open this space for q and a as well as any other --
Either on the specific implementations we have some of our guests here to answer
Questions as well and i can answer on other types of
Questions on that database as well. So if there's any questions
There's a microphone in the middle of the room or we can
Pass this microphone around as
Well. Before you go also remember to
Send the session validation. Thanks. Yes? >>  for the documentation on the
Imlater says that it's not available for graph. >>  The imlater is not available
For graph and that is correct. >>  So can i run this on prim
Before i start wrapping up lots i try it outside and test it?
>>  Let me repeat the question. Is there a way to run cosmos db
Graph on prim and there was an imlater to do that for the sql api.
The answer for that is we are working on what we call a
Compete imlater immu later for all of the apis and it will be
Coming soon in the meantime we suggest accounts for that.
>>  So it will be -- the short answer is no, [inaudible]
>>  That's correct. There will be one being released soon.
>>  Do you have a timeline on that? >>  i don't work in that part
Particularly but if you can send us a tweet we will probably give
You a better estimate. That's how it works.
>>  Do you have examples of the bulk load of functions that was
Being showed? that seems to be the key to this?
>>  So it's in preview right now. We can add you to the preview.
It's hosted on github and that provides a sample on how to use it.
What kind of information were you hoping to get on the demo, on the --
>>  Just an example to see how it works and how to use it.
>>  Right. Well, i can describe it since it
Has not been publicly released i can't show that yet.
It's in preview right now. Basically there's -- in the bulk
It's a library that sends out -- instead of a single request and
A single document at the time it makes a bulk operation of that
And uses a predefined cosmos db procedure for that.
So what you have to do in the application is build both the
Logic for creating vertexes and edges as well as adding all the
Data that will be sent towards the end point. >>  I mean, is that being
Documented currently for the general audience?
>>  Not yet since it has not been released if public yet.
>>  So when is the public preview going to be open?
>>  We expect that to be within the end of the month of august
For that to be publicly available. >>  Okay.
Thank you very much. >>  Thanks.
>>  Are there any other questions?
All right. Thanks for attending.
I was hoping nobody would ask this but there was -- if anyone
Were to ask about sql server graph verses cosmos db graph i
Had a slide ready for it. If you want some -- i'll just --
I'll just walk through it because we have some time.
Sql server has a graph feature today. Ever since sql server 2017 i think.
It's what we call a simulated graph so it's a graph that we
Have to build tables for vertexes and edges and they
Introduced a new function in the tsql language that will
Calculate all hops between two vertexes. Now this is best suited for if
You have existing data in sql database -- i'm sorry, in either
Sql database or sql server on are premise and you don't need
Complex graph calculations. So the examples where people
Used meta data from sql server
And loaded into gremlin required calculations including sub graph
Operations, traversal, et cetera.
The main difference is we implemented an open source graph
Standard so not only the functions that are defined by
The graph community that are common graph scenarios but we
Also have the existing platforms, the existing
Libraries that are open sources that you can leverage towards it
As well as, you know, the flexible nonrelational
Modelling. In the end the sql server graph
Is a relational solution on top of it.
Nobody asked it but here it is. We were ready for that.
Yes? >>  [inaudible speaker]
>>  Well, that's a great
Question. Is question is -- i could hear
The very last part you but i think i know where it's going.
Do we see the gremlin api implementing the neo 4j standard? >>  yeah.
>>  So we are working with the neo 4j team.
Do i have anything here? okay.
Cipher to gremlin library. Yes and no.
We don't have any current plans to support for the current year.
We have the library that has a very specific flavor for cosmos db.
So if you search for cosmos db inside of this repository you
Will see that there's a decorator class that we helped
To find with the neo 4j team for a more active translation of
Open cipher to gremlin. It's available as a driver so
It's on top of any other gremlin enabled driver or any other cipher.
Out does translation back and forth. So that was a catch.
But, yeah, that's what is available today for support on
Cipher workloads. Okay.
Another question? >>  [inaudible speaker]
>>  So the question is, drivers today -- gremlin drivers today
Support two different kinds of places, one of them is a driver
Connection and another one is sending raw strings to an end
Point. The second one is the one that's
Supported against cosmos db because there is a format --
Communicating format called bicode. Very deep dive into.
There's a communication format that we are working on getting
Support. There's a plan to support it and
The benefits for that include fluent api calls as well as this
-- Just like support for additional drivers as well benefits.
So we are working on a fix for fully supporting graphs on b2
And that will be the baseline work for supporting bycode drivers.
>>  [Inaudible speaker] >>  everyone is obsessed with time frames.
It will be ready when it's ready. Just kidding.  It's hard to tell.
Did we have some additional priorities?
We are building the libraries as our main priorities right now.
Service stability fixes on top of getting the latest azure
Compute. For now it's on the back. Sorry.
>>  [Laughter] >>  send us a tweet.
Follow me on twitter. Send a tweet. All right.
Well, if -- i think that is all. Thanks for coming.
>>  [Applause] >>  please complete the
