- [Learner] Hello? Can you hear us?
Unfortunately you muted.
- Sorry I think you can hear
me now, sorry about that.
I'll start again.
So just check that you can hear me now.
You can hear me now?
- Yes, we are good.
- Perfect, thank you very
much, sorry about that.
Okay, so um, thank you
for joining me today,
I'm going to be providing an
overview of graph data science
and the graph Data Science
Library with Neo4J.
So very quickly, what
is graph data science?
So it's basically data science
where relationships matter.
So it's all about thinking,
how can we incorporate
that information about not just
the data, but how that data
is connected to other data?
And how do we incorporate
that connectivity
between the elements.
And this, broadly speaking,
there's two categories
where this might fall into
when we start thinking
about this.
And some of this can
be from a local aspect.
And what we mean by this is,
we've got specific queries
you want to ask coz
there's specific things
that we're looking for.
We know what we're looking
for, we can put a finger on it.
And this tends to be fairly local.
And these also tend to
be very quick things.
And they'll be around some decision making
or pattern matching.
And if we think about
some examples of this,
these will be things like
for a person called Bob,
tell me who this you know,
who's he connected to
up to four hops out.
Maybe I want to know about
how two people are connected,
you know, shortest path between them,
maybe I want to ask questions
such as for a given pattern.
So, as an example, I want
to look at a pattern.
If we're looking at a movie
database, for example,
I want to bring back all of
the people in this database
that have been actors
as well as directors.
So there's something very
specific but you know
what you're looking for.
Again, we're using the
relationships but these
are very localized type of
questions that we're asking.
The other side of this will
be, so there's global analysis
and iterations, and
this tends to be driven
by graph algorithms.
And we're gonna touch
on the different types
of graph algorithms
that there are shortly.
But this is very much
the case of looking at.
Let's go across the entire
graph, and let's start
to understand what are the
qualities of that graph?
So what are the qualities
of the elements in there
and how they're connected?
And this is about
understanding the structure
coz there'll be certain features in there,
that we're not necessarily
going to be able to articulate
with a localized query,
but when we look across
the entire graph
structure, we start to get
some very interesting
features that we can go away,
take away and use that
understanding to process further.
So when we think about graph data science,
there's lots of elements
that go along and build out
what graph data science is.
And there's going to be a number of steps
that we can go through to think about.
So the first one will
be the graph statistics.
And this is really basic understanding,
the measures of what is the graph?
And these will be the
kinds of things such as,
how many nodes do I have?
How many relationships do I have?
What is the average node
relationship density?
So for example, maybe if I'm
looking at who people know
via LinkedIn, so who are
people connected to LinkedIn,
maybe I'm going to get an
average low density of,
say, 300 connections, if I
was looking at a hierarchy,
so product hierarchy, and
I wanted to have a look
at what was the average product
category down to product,
maybe I'm looking at five
connections average, and so forth.
So let's have a look at
the very basic statistics
around the graph, coz that's
going to start to describe
what kind of questions we
can ask in the structure
and what models we're working with.
Then we can start thinking
about graph analytics.
So this is where we're
taking a bit further
and we're starting to ask
more specific questions
of our graph and I'm gonna
get more specific statistics.
So if we're starting to
run some graph algorithms,
we're going to get some
information back about
what's coming from that.
And also there's more specific questions.
And then the third problem
is, how can we use graph
so how can we use graph
algorithms and so forth,
to enrich our existing
machine learning pipelines.
As a very quick overview of
this, many models that are used
tend to have input data and
the input data doesn't take
into account that it may be
related to other input data
going into a model.
So what can we do?
How can we use graph
data science to make sure
that we're reflecting in
some way the connections
between our data that may already exist?
So I'm not going to go into a
huge amount of detail on that,
there is content recovered before that,
but what I am quite keen to do is talk
about the Neo4j graph
Data Science Library.
And for those of you who have used
the Neo4j graph algorithms library before,
Neo4j graph Data Science
Library is the successor
of that one.
So it was previously
a Neo4j large project.
It is now a full, fully
supported Neo4j product.
And its emphasis is on
supporting data scientists.
And there's a lot of elements
that come through it.
So it is an enterprise grade product.
So it scales, it's very
powerful, it's very practical,
It's got simplified syntax,
and we're going to have a look
at some of that as well.
And it's very much targeted at being able
to empower data scientists
and data science projects,
to be able to run these features
and do graph data science,
at scale and in production.
So I'm gonna briefly talk about
the different graph algorithm types,
and then we're gonna touch
on which ones are implemented
within the Neo4j graph
Data Science Library.
So one of the category will
be pathfinding and search.
And as the name implies,
it's all about things
such as shortest paths or
finding the shortest path
between points, waited to maybe
we're going to be using this
in something such as route
planning, or we may be looking
at some kind of path expansion.
So how can we, if we're
looking at supply chain,
sort of branching out breadth and depth.
The next one is centrality.
So this is all about finding
important or influential points
within our network,
and an example of where
we might be using this, so
let's say if we were looking
at a network of friends, we want to find
who are potential
influences in that network.
So that would be an example of why would
centrality be important,
or another example
maybe if we're looking at a
IoT network as an example.
And we want to find out what
are potential single points
of failure, so areas which
are really well connected
and we want to try and find those so that
we can put in redundancy and so forth.
And the thing with the
centrality, we may be looking
at how a node is connected
to its immediate neighbors.
But also centrality is
going to be driven by how
its neighbors are connected,
and its neighbors' neighbors
connected and so forth.
Next one along is community detection.
And as the name suggests,
we're looking for communities.
So this is all about trying to
group together similar nodes
to get similar entities together.
An example where we might be using this is
if we were trying to, for
example, group together products
that people are buying based
on the purchasing behaviors
and the structure of
the relationships there.
Another one is link prediction.
And this is all about trying to find
how entities might be connected.
And this is going to be based
on how they're connected
to entities and is there
sort of synergy there.
So number of algorithms
that sit around that.
And for example, how you might use this is
if you are trying to suggest
people you may notice
if you think about Facebook or LinkedIn,
and it suggests people you may know,
not bad example of trying
to find those hidden links.
And the other grouping here is similarity.
And this is all about
finding common answers
based on the graph properties.
So, for example, you
want to if you try to do
entity resolution, there's
lots of common properties,
or a lot of common
relationships grouping together.
So a very brief overview.
And what we've got currently available
in the graph Data Science Library
today, are all of the ones
that are listed up on the slide.
So I'm not going to talk
through all of them.
I will leave this slide up
for a couple of moments,
and also we'll highlight
your attention to all
the five algorithms that are in bold,
so you can see node
similarity in similarity,
page rank in centrality, and
weakly connected components,
label propagation and louvain modularity
in community detection.
So we'll touch on those ones in a moment
why those are involved.
But you get flavor, we've got
quite a few grouped together.
And you may have noticed
in the bottom right corner,
we've also got another
group posted of algorithms
that don't necessarily fit in
the fight that we've got here.
But these either sort of
helper functions that don't fit
into functions, or these will
be something that are run
as part of the existing
algorithms to help them work.
So very quickly, I want
to touch on the syntax.
So one of the features that's come through
with the new graph Data Science
Library, is that the syntax
is uniform across all of the algorithms.
So irrespective of what
algorithm you're going to run,
you're always going to use this syntax.
So let's just quickly step through this.
So you call it because you're
gonna call the functions,
and you always have the syntax of GDS,
obviously, that's referring to library,
and this section here,
tier to tier support.
So this is referring to
the slide previously,
where I mentioned, we have
some of the bold algorithms
and the onboarding algorithms.
So we're gonna touch on
that on the next slide
about differences between that.
So we're gonna call, so GDS
thought the algorithm name,
versus the algorithms
are going to execute,
the execution mode, and
there are three modes
so we can either write
so what will happen is
if we've specified a relationship
or property that we want
to write the results back to
when we call the write mode,
it'll write the results back.
Alternatively, we can stream the results.
So this won't make changes
to your underlying data.
But if you've got downstream
process that you want to take
the results of the algorithm
to go and do something with them.
So for example, some real
time recommendations,
or some real time, fraud
triggers you can do that.
And the third mode that
you've got is that system.
So it gives you the statistics
of what that algorithm
what is the statistical
output of the algorithm.
So for example, how many
relationships we use and so forth.
And the next bit you've got is estimates.
So this is an optional add
on you can you can put in
when you're calling the
algorithm, and what this will do
is without running the
algorithm, it'll estimate
based on the information you've provided.
So what graph are you going to work with?
What is the configuration
that you've supplied?
It will estimate the amount
of memory that you need
to run the algorithm.
So this is really useful,
especially if you're working
with very large data set
And something's paramount,
and now I'm going to touch
on the the tiers in the bolding.
So there are the three tiers of support.
And there's more information
on the next slide.
So the ones in bold are the productionised
supported graph algorithm libraries.
And those ones come with
all of the right stream
and stats option, as well as the estimate.
Whereas the the ones that
are on bold don't fit
into that category.
So they don't have the
stats option available,
and they won't be able
to give you an estimate
of how much memory the
algorithm's going to use.
But as you can see, everything
follows the same syntax.
And then here you've got graph name.
So what you can do is, if
you're going to be running
a lot of graph algorithms
on the same data set
if you're going to be working
with the same graph algorithm,
but you want to be adjusting
the different parameters
and so forth, what you can do is
you can define what your graph
is going to be, and the
graph that you define,
that you want to run
the graph algorithms on,
doesn't have to have the same structure
as the data in your graph database.
You can basically say, well,
I want to skip some relationships.
And I'll just bring this in,
you have that flexibility
to tailor what the graph looks like.
You can give it a name.
And then you can keep running
the function over and over.
And then you also give some configuration
and in that configuration,
you're going to specify things
such as any algorithms
specific parameters,
along with if you're
writing, what were you
going to write your results to.
So if we have a quick look
at the tiers of support,
so everything that was in bold,
that was all the product
supported algorithms.
So what that means is
that these are the ones
that have been fully tested
for scale, for stability,
that they've optimized, and
if you run into problems
with these ones, these ones are
fully supported by the tier.
And then the next group we've
got of the beta algorithms
and these ones are candidates
for product supported tests.
So these ones are currently
sort of going through
the phase of the steps that
need to be done for stability,
optimization, and so forth.
And so that they are likely
candidates that are will
eventually become the bolded
ones that are supported
that have all the options
around memory estimation,
and stats and so forth.
And then the other group that we've got,
are the alpha groups, and these
are the experimental ones.
So these are the ones that we're
testing out new algorithms,
we're looking to see how they work.
We're trying to understand
some popularity behind them.
And the thing to bear in
mind with the alpha ones
is that there are no
guarantees around them.
They may be removed from any other version
of the graph Science Library,
And it can be changed at any time.
So that's something to bear in mind,
but they are still available.
So where can you find the
graph Data Science Library?
So you can if you've got Neo4J desktop,
you can install it straightaway from there
So that's one of the available plugins.
Something to bear in mind
as well is 4.0 support
for graph data libraries will be available
at the end of the month.
So it's not ready just yet,
but it's just coming very soon.
So you can access the
graph Data Science Library
from Neo4j desktop.
You can use the graph data science
or graph algorithms playground,
I'm going to have a look at that shortly.
And this is really cool
tool that allows you to try
all the different graph data algorithms,
without having to worry about the cipher.
And in fact, it will
give you a cipher snippet
that you can then go away and use.
And the other option you've got as well
is you can have worked with
the graph Data Science Library
in a Neo4j sandbox.
So here, you don't need
to install anything
or set anything up or download.
You can run the sandbox and you've got
between three to 10 days to work with that
and you've also got various
visual interfaces you
can work with.
So you've got three options there.
And what you can do as well
is if you want to just access
the plug in on its own without going
through those interfaces, you
can go to the Download Center
and get a copy there.
And you've also got link
better documentation.
So let's have a quick look
at the graph algorithms playground.
I'm just going to switch my screens.
Okay, So here I'm in Neo4j
desktop, I've got a graph here,
It's an empty graph, I've
not installed anything yet.
And what I'm going to do is I'm going
to add a couple of plugins.
So the graph data, the
graph algorithms playground,
uses Apock, so I'm going to add Apock
and it also uses the graph
data science library,
which I'm going to add
there so effectively.
In desktop, you have the
option to add in the plugins.
And you can see we've got the
graph data Science Library
that to add as well.
So I should add that shortly once Apock
has been downloaded and installed,
and it's now going to install this one.
Excellent.
So to install the graph
algorithm playground,
you go to graph application section here.
And if you've not
already got it installed,
if you go to the graph apps
Gallery See you go there
and click on the arrow, it will give you
all the available graph
apps and there you'll see
you've got the graph
algorithms playground.
So I'll just show you that quickly.
So there you go, it's available there
so you can install that.
I have it already installed,
as you can see here.
So what I'm going to do
is start my database.
So I'm going to start that.
So this is an empty database
I've not put any data in yet.
So we're gonna quickly
add some data in first
before we start with the playground.
And I'm going to use the
movie database sample code
that you get when you
run the object browser.
So let's do that shortly
once the database is started.
So it's worth mentioning
so whilst it's starting,
so what do you have as well
is you have some code sample.
So if you don't have any
data in your database,
and you want to run some
of the pre canned stuff,
you can do that.
So there is an option there.
They've got three data sets,
which you can click on and run.
So I will show you that quickly as well.
So I'll also show you with some
data stored in the database.
It's taking a bit longer than expected.
We're good, okay took a
bit longer expected to do
so database is now up and running.
So let's quickly put some data in.
So I'm going to open
browser and I'm just going
to load the Movie Database graph example.
Okay, I'm just gonna do the right code.
Open database
I'm going to move on now.
So something to bear in mind with
the graph algorithms playground
is it runs in your data
as is, so you don't have you
called sort of alter and go,
well actually, I know I've
got person accident movie
accident person, I want to skip the movie,
but I just want to run person to person.
So you can't you don't
have that level of control.
It's cypher free.
So what I'm going to do
is just I'm gonna add
an extra relationship coz
I'm going to show you how
I run PageRank on this.
So what I'm going to do is
if I show you the DB schema
and you can see we've got them.
You can see we've got
person and number of ways,
interactions with movie.
And what I'm interested in is
I wanna do PageRank on actors,
so people actors and how they
interact with each other.
So I wanna create a new
relationship from person
acted with person.
So I'm just gonna quickly do this.
Excellent, so it's
basically, I'm using switches
which is moving here and
understanding rounds,
and then I'm trying to create
a relationship between them.
Okay, we're not just
gonna create relationship
between the persons.
So I'm gonna run that quickly
when I refresh my schema,
you can see up now
I did acted with relationship
person. Fantastic.
So let's start up the playground.
So to start up apps that
aren't browser, just press
and hold down arrow, where it says open,
and I'm going to open up the
graph data science playground.
Excellent. And then you can see here
the different algorithm categories here.
So the only one that
we haven't got here is
the types of link connection one.
So I want to do some community detection.
I'm going to select that one,
it's not community detection I want to do
I want to find centrality,
So I'm gonna select PageRank.
So I'm gonna select a
label here, which is going
to be person, I'm
selecting here acted with,
coz that's what we just created.
And I'm going to use a
little default settings,
I don't have any way to
practice on leaving that blank,
I'm not going to store the results,
I just wants to stream them.
It's like number of concurrent
pros, it's going to run
and prove iterations,
and so these now are,
these two are the patron
specific parameters.
So I've run that so I didn't
have to write any cipher,
I can now run that as a query.
And it's gonna go away and
find out the results for me.
So we can see here See,
Tom Hanks has got a nice,
nice high school bag
packed into his access.
And again, I can click on visualization.
And it will show me it'll
bring up some of the notes
and render some of the
connections there so you can start
to get a flavor of that and
I can have a look at some
of the weights put on there.
So getting information
around different axes
have been joined.
And if I want to take that code away,
and I go actually old to take that code
and run it in browser, you
can click on the code button.
And here you've got a set of parameters.
So this was the number of
rows, I want to send back
to limits, you can see use a limit.
And in here in the PAM config, here,
I'm specifying all the bits and pieces
that we would be calling the algorithm.
And if you remember, we talked about
we talked about that,
that that config map port
report information in that.
So that's what's going on here.
So that's just creating
that, and we add it in.
So that's an example
you can copy that route,
then copy the code here
and use use the parameters
and use the code, you can
run that in your browser
rather than running it through the tool.
And I talked about the datasets
that are available as well.
So if you look at the bottom here,
we've got the yellow database symbol.
Here you can also load
some of the data as well.
So you've got, again,
friends, European roads,
and Twitter datasets.
So you can use those to
play again, play with
using the graph data science playground.
So you have the option as well.
So let's quickly go back to this.
So, um, quick mention before we finish up.
So if you haven't already, do check out
the graph algorithms book,
it is free, free PDF,
but it's only free until next Wednesday.
So if you haven't already
got it, get yourself a copy.
So it talks through what
two examples and why we use
the different graph algorithms.
How would you use them
both in spark and Neo4j.
So you've got some great examples there.
So this is based on the previous
graph algorithms library.
But Thomas Bratanic has put together
this fantastic porting
examples of how would you take
the examples that are in
the book and recreate them
in the graph Data Science
Library not available.
So again, you we talked about the plugins.
So you can either download
a desktop and have it
all plugged in via that, or
you can go to Download Center,
get the draw and do that manually.
If you want to have a play with
the graph data science
sandbox, you don't need
to install anything, you don't
need to download anything.
Here's the link that get you
directly to that sandbox there.
And you've got the documentation there.
And mark this, thank you very
much running out of time,
I'll have a quick look if
you've got any questions,
so if you've got any do share them.
So let's have a look, any questions?
Okay, so what I will say
is the next session will be
on Apock and supercharging your projects.
You've got the link here
if you look at the top
of the chat window, you have got the,
It pinned up the address the big marker.
So you should see that shortly.
Thank you very much
