- Hello, everybody, and
welcome to this video
in our series about Graph Data Science.
In this video we're going to learn about
Neo4j Graph Data Science Library
and how it aims to simplify
the Graph Data Science experience.
So first, let's have a look at what
the traditional Data
Science experience would be,
especially when we're working with graphs.
So we start over on the left hand side,
we'd be doing data modeling,
we'd be trying to work out
how we're going to shape
this data into a graph.
So perhaps we're starting with
the data in some flat files,
or maybe in a relational
database, we've got to figure out.
We're going to build a graph.
And we're going to work out which library
we're going to use, which
algorithms we're going to use.
Maybe we're using a Python libraries,
we could again, find there.
And maybe there are lots
of different libraries
to choose from, we don't
really know which one to pick,
then we find one and maybe
there aren't really any docs.
We pick a library, we're
like, Okay, I gotta go
and learn the syntax, that new library.
We then got to make sure
we've got our data shaped
into the right format
to work in this library.
And then finally, we've
got to analyze the results
and figure out how we're going
to get this into production.
And so this is what the Neo4j
Graph Data Science Library
and indeed the graph data
science platform aims to solve.
So like we talked about
in a previous video,
there are a set of tools
that can help us work with
Graph Data Science. So we
start with the Neo4j database,
which hopefully you'll be
already be familiar with.
We've got a cipher query language there.
And so this is a product that
allows you to natively store
and query crops. So that's,
that's the centerpiece of our
graph data science platform.
And then either side, we've got two tools.
As we've got on the left, we've got the
Neo4j Graph Data Science Libraries,
this is allowing us to
run app graph algorithms
over this data. And then
on the right hand side,
we're not going to be
talking about that so much.
In this video, we've got Neo4j Bloom.
And this is a tool for doing
visual graph exploration
of the results from the
Graph Data Science Library.
So what is the Graph Data Science Library?
So it's a couple of things.
So the first thing is,
it's a graph catalog.
And so this means it's a
in-memory analytics version
of the graph optimized for
running analytics workloads
on it. And then it's a set of algorithms
that can run on that in-memory graph.
And so we need to have that
in memory graph because
we can't run the algorithms fast enough
if we have it just running
directly from the database
itself. And it's an
optimized format for doing
these types of workloads.
So how does it work?
So there are two ways that we can use it.
So we can either run it so it
load as part of the algorithm
running it loads a projected graph,
and then executes the algorithm,
or equally we can separately
load that projected graph.
And then we can run lots of
different algorithms over until
we probably be using the
one where we're loading the
projected graph while
running the algorithm
when we're just playing around
and getting the hang of it.
But then once we're in
production, we're going to want to
load that projected graph separately
and then run our algorithms over.
Then we can either choose to
store the results directly
into a Neo4j, or we can actually update
the in-memory graph itself.
So what types of algorithms
can we run on it?
So we've looked at some of
these in an earlier video.
So there are five different types.
We've got pathfinding and
search. This is, "Hey, can I
find the shortest path
between two places?"
We've got centrality or can
I find the important nodes
In my graph? We've got
community detection, which are,
"Can I find clusters or
partitions in my graph?"
Got Link Prediction. How
likely are nodes to form
a future relationship?
And then we've got the
similarity algorithms,
which tell us how alike two nodes are.
And so all of this leads to
a more pleasant Data Science
experience. As we've got
simple and standardized API,
so all of the algorithms
have the same API.
So once we've learned how
to use it for one of the
algorithms, we can just
reuse our knowledge
across the other ones.
There are documentation
and usage examples for
each of the algorithms.
So if we want to just
copy paste those examples
and then adapt them to our
own data sets, we can do that.
The usage experiences is very friendly.
So if you make a typo, it will give you
some sort of friendly error message.
We have procedures that can
compute memory management.
So you can check before you run it,
make sure you're not going
to go out of memory by
running an algorithm. And then
once we've got the results,
we can go and explore
them in a tool like Bloom
to make sure they even
make any sort of sense
before we productize it.
That's all of this leads,
hopefully we go back to
our initial slide to a simplified
Data Science experience.
So data modeling is not such a problem,
because with Neo4j today is a radiograph.
We've got easy docs to work with.
We've got algorithms
that have been validated
on big datasets, there are
tutorials to go with these.
The Neo4j syntax is
standardized and simplified,
you can reshape your data very easily
just with a simple command.
And then we can easily
write the results to Neo4j
and move it straight into production.
That's the end of this video on the
Neo4j graph Data Science Library.
Hopefully now you've got a
good understanding of how this
makes the Graph Data Science
experience much better.
And in our next video,
we're going to look at,
how we can actually go
about using this library.
