- Hello there everyone.
I am Karin Wolok.
I am the Community Manager
for Developer Relations team at Neo4j.
And I am here with my
colleague, Mark Needham,
who is also one of our
Developer Relations engineers,
and another one of my
colleagues, Andy Jefferson,
who is our presenter today,
who will be talking about
deep learning on graphs.
So, Andy, I will let
you introduce yourself,
if you would like.
- All right, hi everyone.
Yeah, I'm a software engineer at Neo4j,
and I also work as a researcher
at the organization called Octavian.
And so I'm gonna talk about
the research that I do
at Octavian on knowledge graphs.
- [Mark] Yes, Let me, let me just quickly,
and just do a little bit of housekeeping
before you get started then, Andy.
If you have any questions
while Andy's presenting,
feel free to ask those on
the YouTube chat we've got
on the right-hand side.
Sometimes, YouTube doesn't
print your resolution
at the proper, proper
resolution, so on your bottom,
right-hand side of your
control panel, there's a little
cog with HD icon on it.
Set that to at least 720P,
and you'll be able to see
everything more clearly.
Other than that, I'll hand
back to Andy for the...
- [Karin] I do wanna also
add, if anyone out there,
if you're interested in sharing
on one of our online
meetups in the future,
you can go to our community
site at post your project
under Projects or Content,
and if it is like a blog post and
that's what we're gonna be
selecting in our future talks.
So, you can also vote there
for things that you
wanna see in the future.
- [Andy] Awesome, OK, I hope
you guys can see the screen.
- [Mark] Yeah, we're all good.
- [Andy] Awesome.
Cool, so I'll talk about Graph AI.
And, I've already introduced myself a bit,
but just to make this super clear.
I'm an engineer on the
cloud team at Neo4j.
We're building at Neo4j
software service in the cloud,
but pretty exciting
thing that's coming up.
But today I'm talking about
the research that I do
on Artificial Intelligence at Octavian.
And Octavian is an open
source research organization.
So,
But if you're not graph is
that we have these nodes,
and we have these nodes
which are connected by edges
and in Neo4j we have this
property, graph model.
So, in the property graph
model, both nodes and edges
can have properties.
So, a node like employee
can have properties like
name, date of birth, and ID number,
maybe country of residence.
And the relationships can have properties,
as well as having a relationship type.
So, here we've got two
different types of relationship,
like location and CEO.
So, the image bellow is the
graph model that DeepMind use
in a recent research paper.
And the reason I pulled
that out, is to show
that these two models are the same thing.
And a more mathematical formalism,
but you've got the same information.
There are nodes connected by relationships
and, both the nodes and the
relationships have properties.
And DeepMind normally is called attributes
and they're often thought of as vectors.
But the models are the
same, and we can transform
between the two of them.
So, that's what we're
concerned weather these graphs.
And graphs are really powerful
at expressing all kinds of different
knowledge, information, and data.
So, why we're interested in these graphs.
Well, like I said,
they're really powerful,
and we can use them to represent...
It's been around as a concept
and since Leonhard Euler,
it's the guy in top left here,
and he is famous for this Seven,
famous within graph theory
for the Seven Bridges
of Konigsberg problem,
and which is the first, kind of recorded,
a mathematical graph
problem (speaking softly).
And there I've got some
other examples of graphs
that we think about.
In the top-right there's a semantic graph,
where we're mapping out
a kind of idealized set of relationships
that we can use to describe the world.
And in the bottom-left I've got more of a,
kind of database graph, and so
we've got lots of information
about different instances
of customers, products
interacting with each other,
and to keep track of things like
just how many customers do we have.
And in the bottom-right I've
got a transit network graph.
And anyone who's lived in the city
is probably familiar with that.
And so, each of these representations
is a useful to us in a different way.
But they're all graphs,
and they're all made
up of nodes and edges.
So, we're interested in graphs
because graphs are everywhere.
We can represent lots
of knowledge in graphs.
And if we know how to
manipulate graphs cleverly,
then we can understand and interact
with all of that knowledge.
The question is, what is AI?
I'm not gonna try and answer that,
because I don't know the answer.
And what I'm gonna talk about
is, what is deep learning.
So, deep learning has,
this is its building block,
if this was interactive, I'd ask,
how many people have seen this before.
But, typically, when I ask that question,
it's about 50% of people in the audience.
And what this is, this is
the densely connected layers
in a neural network.
So, in this neural network,
each layer, each node,
each item, and each
layer is fully connected
to all of the items in the previous layer.
We've got two hidden layers,
and that's where the deep comes from.
So, we talk about deep neural networks,
the depth is a number of
layers in the neural network.
And these fully connected layers,
where the true machine
learning and gradient descent,
we learn the parameters
of how nodes in one layer
interact with nodes in the next layer
and is the building block
of a whole node range
of AI innovations that've
come around recently.
But the fundamental building
block of these neural networks
usually using densely connected layers.
So, what they are being used for?
This AlphaGo is probably the most famous
application of deep learning.
And AlphaGo use deep
reinforcement learning
and to train a computer
to learn to play Go
better than some of the best
Go players in the world.
And so this is a great example
of AI outperforming humans
using deep learning.
And there're a few others
that we can make out as well.
So, here is some from the
world of image recognition.
On the left there's a something,
those are the neural
network called MAC nets
of which I am a massive fan of.
And this shows multi-step
reasoning from a neural network,
at the same time as
doing image processing.
And on the left we've got
the question that was asked.
So, this is deep mining
image recognition at,
natural language processing
and multi-step reasoning
which is really awesome.
So the question this was asked
was what color is the matte
thing to the right of the sphere
in front of the tiny blue block.
And in the central column, we can see,
which of the words the machine,
the neural network was focusing on
in each stage of reasoning
toward going to the answer.
And on the right we can
see, which pass of the image
the AI was focusing on in
each of those reasoning steps
and the goal towards the answer.
So, it starts
to find the sphere and then, finally,
it picks out the cylinder,
which is the answer to the question.
So, this is really awesome.
Now, science has inspired
a lot of our work.
On the right-hand side
you've got some really complex problems,
and the top-right one is an
image classification problem,
but it's classifying multiple
parts in the same image.
And a the bottom-right
we've got a problem,
where the AI is predicting,
where a car is most likely to go.
Testing path prediction,
but from a still image.
So, this is learned about cars,
that cars tend to go forward,
and trees, and tend to drive along roads.
And that's, take a really
sophisticated piece of AI,
and all of this AI is
performing as well as or better
than humans on the task.
So, we wanna bring this
superhuman performing AI,
and we wanna apply it
to the world of graphs,
which we use to represent
all kinds of information
from the stock in our store
to how we travel around big cities.
Before we get to that, let's
take a step back and think
what is machine learning,
what is the machine learning
process and interface look like.
So, in machine learning,
what we generate is a trained model.
Can learn and has to understand some data,
and then we train it,
by giving it lots and
lots of examples of data,
and from that it learns
a set of parameters,
and which allow it to predict
from the data some outcomes.
When we train the model,
there's, basically, two things
that we can do with that model.
We can look at the parameters,
or the weights that it has learned,
or we can just use it to make predictions.
Sometimes we do one or the
other, sometimes we do both.
To make this a bit more concrete,
we can just think about linear regression
within its framework.
The linear regression, our
model is y is mx plus c,
and, hopefully, most of you
are familiar with that model.
In this model, the parameters
or weights that we learn,
are m and c, and the
training data that we use
to train the model is
list of x and y values.
So, x is the input value,
and y is the output that we're predicting.
And we train it with
loads and loads of values,
and then, if we wanna figure
out what why is for a given x,
we can plug it into the model
with the parameters that we'd learned.
Sometimes and something like fixed x,
you're actually is,
and less about predictive
nature of the model.
At other times, which have much more
rather predictive nature of the model
and not at all about parameters.
The same approach applies to classifiers,
like neural network classifiers,
like the image classifiers
that I showed before.
We train them in the exactly the same way,
we have a model, which is neural network,
that model takes the
whole load of parameters,
so here w, which is the
parameter of our model,
is, actually, a matrix of maybe thousands,
or hundreds of thousands of values,
is in the category of those images.
So the trained data could be
an image net, for example,
which is a list, which is a
whole collection of pictures,
and for each picture, a classification.
So, this is a picture of a dog,
this is a picture of a cat,
this is a picture of a car, and so on.
And we take those hundreds
of thousands of images
and we train the model,
and it learns a whole set of weights,
and those weights enable
the neural network
to classify the image.
Another example of machine
learning is on natural language.
And this is an example
where we use the weights
rather than predictive power,
is where we generate a word embedding.
And so that would be something
like word to word2vec,
word2vec is a model which
converts, essentially,
words, a whole dictionary
of words, into vectors.
Those vectors then used for
further stats in different,
kind of machine learning
or predictive processes.
And the way that's trained is
by predicting a probability
that certain words will be at what appear
in the context that
surrounds a given word.
And we train a neural
network to predict that.
But with this training,
we don't actually care
about the ability to predict
what words would appear
in the same sentence
as a particular word.
What we care about is the
resulting word embedding,
which is a trained weight that we get out,
and the fact that that word
embedding has this property
that similar words have
similar embeddings.
So, the question is,
what is deep learning.
And what deep learning
is, is machine learning,
training big models for data and
using gradient descent.
Don't worry, if you're not
entirely fine with all of that,
and so long as you understand,
this is the high-level view
of things we're trying to do, that's fine.
So, how do we take this deep learning
and apply it to graphs?
This is, probably, the scariest slide
in the whole presentation.
And what we're trying
to do in Octavian was,
having taken that step back to understand,
like what does this, what is
the path to machine learning.
When said, how would we like
machine learning on graphs to work.
So we thought about image
classifiers and the way they work
is they learn to predict the probability
that an image belongs
to a category, right.
Is it a horse, a dog, a cat, a car?
So, that's predictive
probability for each category,
given an input image.
And it does that by applying
a convolution on neural network.
The image, that's great.
What we wanna do with
the graph is, perhaps,
take a section of the graph,
a subgraph, and predict
some category it belongs to,
or, perhaps, we wanna predict
class belonging to a node,
based on the subgraph around that node.
So, an example prediction
that we might wanna make
is at, you go, what way you gonna vote.
Are you gonna vote Republican,
or you gonna vote Democrat?
And we might be,
the hop or to hops, OK.
Now, we don't wanna do that,
because we don't want
to subvert democracy,
but that's an example
of a predictive graph.
So that we wanna make,
that we wanna research
how we can make machine
learning applied to graphs
and applied to subgraphs.
And we wanna be training it on subgraphs
or nodes from graphs
and predicting outcomes,
regression classification
or embedding based on those.
So that was our goal.
Now, there are some existing mechanisms
that you can use on a graph.
At such as node2vec, which we didn't like,
because they didn't fit into this pattern.
So, the way that node2vec
works, for example,
is it takes random parts,
random walks from within the graph,
and then it uses those,
with a neural network,
as the training data.
And the reason we didn't
like that was because
each item of the training data
is just a random walk in a graph.
Thing is, you're throwing away graph
structuring graph information,
in order to convert
from a comprehensive subgraph
into a simple sequence,
a random walk,
and you're throwing away
data, so that you can fit
into the training model that you have.
So the training model can
only cope with sequences,
because it's a adapted from
natural language processing,
and so we're trowing away
graph data, in order to sew on
your network model.
And we didn't wanna do that.
We wanted to keep all
the graph information
and find some way of
adapting neural network
to be able to deal with
that graph information.
So, we had a kind of goal, but, obviously,
we're not the only people doing that.
So we took a look and,
we took a look at what's already going on
in the world of deep learning with graphs.
And there's a reason
I'm honest about that.
There is a couple of examples of results
from a research on using
networks on graphs.
One of the thing I wanna
pull out from here is
just the fact that these are achieving
the superhuman performance,
at that we can get
with deep learning on things
like images or playing Go.
So, the results on the left here,
I'm looking at DeepGL, which
is a graph embedding approach
on node2vec, which is another
graph embedding approach.
And DeepGL is best,
performing best than node2vec,
but you can see that the
success rate is getting,
is 87% at best, and
really, there's a lot here
that is below 80%.
And that means, below 80%
means getting it wrong
more than one in five times,
and we're really looking for
stuff that's going to achieve
a higher success rate than that.
And, similarly, on the right-hand side,
with classification tasks,
you can see there are things
that are achieving 50 or 60%.
And on the bottom-right, there's a task
which is doing predicting
chemical reactions,
and chemical reactions prediction task
is really interesting,
because it deals with lots
and lots of small subgraphs,
compared with looking
at things like Reddit,
which is analyzing one
really massive graph.
But still, the chemical
prediction, when it's saying,
we're looking at the first prediction,
like the top prediction
from the neural network,
the success rate is 70%, 78%.
So, we thought that that was
some scope to improve this.
So the challenge that a lot
of these legacy mechanisms
take on,
is how we take a complicated
and variable structure
like a graph and fit it
into a fixed-size matrix,
in order to apply existing
neural network techniques.
So, most existing neural
matrix predetermine size of matrix,
and from that it's able to make processes
and make a prediction.
But with the graph, if you say,
I want to look at a person,
I might want to look at there friends,
or if I'm doing a transit
station, transit network,
and I say, I've got a graph
and I wanna look at all of the places
that I can get to, that
are within five stops
of the station that I'm at,
depending on where you are,
the size and shape of your data
can be radically different.
You might only have 10 friends,
or you might have a 1,000 friends.
So, going from the graph
into a fixed-size matrix
is a technical mathematical challenge,
and a lot of these
approaches try to solve that.
And one example, apparently, solve that
is by taking random walks in the graph,
but
this acting more recent,
this tends that to be
a bit of a red herring.
So, thinking about how neural network work
and a lot of people say
that neural networks
are good for unstructured data,
but, actually, the way that
they work and the things
they're successful on are tied
to very specific data structures.
So, neural networks are
great at classifying images,
and doing all kinds of
image processing tasks,
and they're really good
dealing with sequences,
so, natural language task
are almost always approached
as sequences.
So, it take a sequence of
words, sometimes even a sequence
of letters in a word and,
sometimes they process
the sequence by-directionally,
but they're always dealing with sequences
for natural language and
these image grids for images.
And the property that these have,
is that we already know in advance,
how individual data points
are related to each other,
or individual positions
within the input matrix.
So, in a sequence, the item
that comes before something
and item that comes
after, are more relevant
than the item that's
far away in a sequence.
With an image, we know that
pixels that are adjacent
to each other, or close to
each other, are more relevant
than pixels that are
far away to each other.
And the structures that
neural networks use
to be successful with
these, represent map data.
So,
taking a view of the Go board,
the Go model is able to
use a dense neural network,
because every location on
the board is equally relevant
to every other location.
Because of the way that Go is played,
you can play a tile in
one corner of the board,
and that can have an effect on a position
on another corner of the board.
So, using a dense network
makes sense for that,
because there's a equal likelihood
that any position on the board
might have an important impact
on any other position on the board.
But like I was saying,
that's not true for images and sequences.
And, as it happens, the
most successful models
for images and sequences don't
use dense neural networks.
So, the images,
we actually use convolutional
neural networks.
And a convolutional network
builds in this property
that adjacent pixels
are more important than far away pixels.
So, instead of connecting every
single pixel in the input,
and adding it or mixing it in some way
with every other pixel in the
input to produce an output...
And we combine those together
to make a pixel in the intermediate
layer, and then we find
that convolution.
- Andrew, you cut out,
I think you cut out a little bit.
Can you repeat what you just said,
I think you cut out,
so, just wanna make sure
everybody can hear you.
- [Andy] Yeah, sure,
cool.
And just recapping that for Karin,
in convolutional neural network,
what we do is we use a convolution kernel,
and that convolution
kernel only looks at image,
at pixels that are close
to the particular pixel,
we are concerned with, and
then connects those together.
It doesn't take into
account the values of pixels
that are far away.
And so we don't actually use
these dense neural networks.
And with images, we use these
convolutional neural networks.
And they've built-in that expectation
that pixels that are adjacent to...
By analogy, we do a similar thing
with recurrent neural networks
that are dealing with sequences.
In the recurrent neural network
makes the item that comes
before me in the list
much more relevant than
items that are far apart.
So, based on this,
how should we be thinking about graphs?
If we look at the graph and
the dense neural network,
it should be obvious
that the dense network
isn't encompassing the
properties of the graph.
What we like, would like
is the things that are close
to each other in a graph
to have more of an effect on one another
than things that are far apart in a graph.
But that's not something
that was built into a dense network.
So, we don't expect,
or we shouldn't expect,
should work, not with a
model, a neural network model,
and that can maintain
the priors of the graph,
maintains this property,
that things that are close to each other
have more influence on each other
than things that are far apart.
The challenge with graphs,
is that they're really
variable in structure.
Right, so images and sequences
have the exact same structure every time,
but a graph is gonna have
a different structure,
or a variable structure,
and even if it's got a fixed schema.
So, the challenge with AI is not just
how do we turn a graph into a vector,
so that we can put it through
dense neural networks,
it's how do we cope with a
neural network architecture,
that can understand
graphs more effectively.
So all of these semantic,
or knowledge graphs
as it is to more database graphs.
So,
biases and to put it in a phrase,
is how we can structure the neural network
to retain graph structural priors.
So, I'm not the only
person, and we at Octavian
have many people thinking about this,
my idea is in this recent paper.
And a lot of this we were working on
before this paper came out,
and then when we read this
paper, we just said wow,
these guys, they are, they've
handled same, built,
much better than we ever could.
So, I think,
Relational inductive biases,
deep learning and graph networks
from DeepMind, MIT and
University of Edinburgh.
So, that paper is a lot of work to read.
I think I've read it
about five or six times,
and I'm still getting stuff out of it.
But I wanted to pull something
out of it for you, guys.
So, one of the things
that that paper expresses
is this model for graph processing,
and these are the only
equations, by the way,
in this presentation other
than y equals mx plus c.
So, it doesn't get any
harder than this (chuckles).
And it's gonna get easier.
They've proposed this algorithm
for processing a graph.
The first step, the edge update,
is that every edge is
updated based on the nodes
that it's connected to
and the global state.
The second step in the algorithm
is we update every node,
based on the value of the
edges that connected to it.
And then the final step,
is we take the whole graph,
all the nodes and all the edges,
and we update the global state.
And to do this, we define six functions.
One function for transforming the edge,
and one function for
aggregating all of the things
that attach to the edges,
one function for transforming the node,
and one function for
aggregating the nodes,
and a function aggregating
all the info last step.
And with these six functions
and this algorithm,
and that paper proposes that we can
implement lots of different
existing graph algorithms.
And that data have to be anything
about the neural networks,
they can be algorithms like PageRank,
or Breadth-first search.
They just propose this as a framework
for doing a graph computation.
But they don't want to
say, if we were to make
these functions neural networks,
or just some of them
can be neural networks,
and some of them can be a
kind of identity function,
and then we can train those functions
using gradient descent to
fully trainable mechanism
and that learn the functions necessary
to transform the graph towards some goal.
To put this, hopefully,
the idea that we're proposing is
that we load the specific
graph that we're dealing with
into memory,
and then we use some collection
of neural network functions,
that are able to learn,
to transform that graph in memory,
and then, eventually, we get some output
that is reading information
from a transformed.
- [Karin] Andrew, we might have to
have you repeat that one
sentence, piece again,
'cause I think that the
Internet was cutting out again.
- [Andy] OK, sorry about that.
A simpler way to look at this is,
what we are proposing that
you do is you take a graph
and you load it into memory,
and inside some application,
where we can then transform
that graph in a series of steps.
And each of those transformations
is being carried out
by some neural network
or combination of neural network functions
that are end-to-end trainable.
And then, eventually,
and this can be thought of in this
graph memory network, I think.
So, we're not transforming
the graph into a vector
and putting to use some
process or, actually,
interactively updating the graph,
and with the collection of
neural network functions.
So, that's the approach
that we think is suited
to the structural priors of graphs,
and it's the approach that DeepMind
has proposed in that paper.
Question is, you know,
from our experience, does it work?
So,
with which to train and
test whether it works.
So, we built this dataset
called Clever-Graph.
It's inspired by the dataset that's used
for the image training
and reasoning process
that I showed earlier on,
when we were looking at
you know, what is the color of the sphere
to the left of the gray
cube, or whatever it was.
That task come from
the data called Clever,
inspired by that to create a dataset
that was graph-based
rather than image-based.
And we called it Clever-Graph.
All Clever-Graph consists
of is 10,000 data points
and each of those is a
question, answer, and a graph.
So, and each of these graphs is unique.
And we modeled it on transport networks,
roughly based on the London
underground transport network,
but each graph is different
and synthetically generated.
And each of those
synthetically generated graphs
is, effectively, unique.
It has different
stations, different lines,
different organization of
those stations and lines.
And each, the questions are
sampled and there're about 20
for each question type there
are some different wordings,
which we generated with the computer,
and then there are
these different answers.
So, on these datasets, we hope to be able
these unique graphs and
answer questions about them.
What's really important to understand
that we're not training
the graph to memorize,
we're not training the neural network
to memorize answers on a specific graph.
We actually training it to
deal with dynamic graphs
that it never seen before
and figure out answer
given map the scheme it respects.
So, as an example of one
of the graph networks
that's generated by the Clever-Graph,
so you can see this is the kind of section
from the transport graph,
sample of the questions that are included.
Particularly interesting
questions for me in here
are, for example,
questions.
- [Karin] Andrew, cut out a little bit
when you were explaining
which questions were interesting
and want to repeat that bit again.
- [Andy] Yes, so there's
within these questions
that I find interesting,
the station adjacency, so,
like kind of architecture
station is adjacent to station,
they require, potentially,
multiple steps of reasoning,
and to see if, to find
which station is adjacent
to the station, given
a particular property,
require looking at multiple
nodes attached to a given node
and looking at their properties.
Whether there is a station exist at all,
was also a quite an interesting question,
so, is there a station
called Oxford Circus,
'cause it might not exist
in a particular one.
And that question is very interesting,
'cause a lot of neural networks don't,
and neural problems
don't deal with the case
where the answer doesn't exist.
So, for example, when you
do image classification,
typically, every image contains something
that you can classify it,
train with images, or just
white noise, or empty space.
And these questions require
a mix of different skills
from our graph reading engine.
Those skills include counting
nodes, counting edges,
reading properties from nodes,
and these multi-step
reasoning that require
to traverse a graph,
find the shortest path between two nodes,
and all to combine facts,
combine or compare data in a graph.
So, this is how we're doing
with our architecture,
training on and testing
against Clever-Graph.
We can see, we're able to get,
because it's synthetic data,
we're able to get virtually
a 100% on these questions,
once we've achieved
the right architecture.
So, here what we've done is
we've separated them out into
questions that require different skills.
The first sets are,
skills just require looking
up properties on nodes,
then we move on to things
that are more complex.
So, the next one, station adjacency,
requires looking up both nodes and edges,
then we need to look at nodes,
their properties, and edges.
Now when we get on to things like,
how many stations are between
one station and another station.
That requires the network to learn
how to do Dijkstra's algorithm, right,
finding the shortest
path between two points.
And you can see, we're still
getting 98% accuracy on those
with Dijkstra's algorithm,
we're getting 98% accuracy
on stations that are
up to nine hops apart.
There's also the existence
questions that I talked about,
and there challenging for
different reasons and,
then you can see these questions
that we haven't yet tested the network on.
So, this is a work in progress.
But each time we look at this and we work
on another set of questions,
we're maintaining the performance
on the previous questions
and adding more abilities to our graph.
So, does it work?
I hereby shown that we're able to achieve
a pretty good level of success
from a range of different problems,
and I don't mean this end-to-end,
so, the neural network gets its input
the English language text
and the unique graph.
And these training results are on graphs
that the network hasn't been trained on.
So, it's never seen those graphs before,
it just has been trained on
the schema of those graphs.
The fact there are stations,
they have a particular set of
properties, there are edges
that have particular properties
and those connect those stations.
So, we're really confident
that this approach is showing
much better results than a
lot of previous deep learning
on graph approaches, but we don't have
the full set yet to show,
so, it doesn't work.
I'm promising, but I can't say for sure,
but let's talk a little
bit about how it works.
So, I recommend I've
got about eight minutes,
and that should be about right.
So,
the questions we've got to recap.
We've got a graph, we've got
an English language question,
and we've got an answer
which is either a number
or it's one of the stations
or lines in the graph.
The Graph Networks
algorithm gave us a method
for propagating information through
and transforming a graph, which is great.
But we have to also bring our
question into the equation,
we have to prime the graph
to answer a specific query.
If, 'cause we get up the same
graph, but different question.
Right, so I might, instead
of saying how many stations
there between Bank and
Temple, I might say,
is there a station called
Oxford Circus, or I might ask,
what is, you know, is there
a rail connection at Bank.
So, we have to take the graph,
and we have to prime it,
or somehow prime the neural
network to answer that question.
So,
DeepMind gave us a big like boost
in how we can structure neural network
to retain graph priors,
but it hasn't helped us figure out
how to prime graphs to answer questions,
which is the task we set ourselves.
So,
we took a look at other research,
and this paper really stand out
and really is the foundation
with a lot of work in
deep learning environment.
And this introduces a new cell,
called the Attention cell.
And that had a big impact, for example,
on network used for the MAC
image at research that I heard,
the MAC net's image reasoning
that I showed earlier.
So, this introduce this Attention cell
which is a fundamentally
different building block
from neural network from the deep layers.
And we can use this to solve that problem,
how do we prime the graph,
and also solve problem,
of how do we read out from a graph.
So, the way the Attention cell works,
is it allows us to take a query
and then to take a list,
a, potentially, variable-size
list of elements,
and score each of those
elements against that query.
So, these embedded question tokens,
in this case tokens of
the words of the question,
but it can also be a list of nodes,
they could be a list of edges,
or could be a list of edges and nodes.
And we take the query and we score it
against each of those items,
and then we use the soft MACs
to transform those scores into
a probability distribution,
and then we weight the input items
by the score that they get,
after normalization with soft MACs,
and then we aggregate them all together,
and then output a
fixed-size control signal.
So, with one benefit of
this is it allows us to take
a variable length list,
like a list of nodes,
or a list of edges, and convert
into a fixed-size signal.
And Attention has really been used to
make groundbreaking
improvements in things like
a natural language processing
and word embedding,
as well as a multi-step
reasoning and understanding,
for example, what the graph is looking at.
By looking at these scores,
we're able to generate those
images that I showed earlier,
that show you what the neural network
appeared to be looking at.
So, we really use these Attention cells.
And the way we use them with a graph,
is that we use them to write
a signal into the graph,
based off of the query that we were given.
So the query is like,
how many stops are there
between Temple and Bank,
and that query then is
used in Attention cell,
which is looking at every
single node in the graph.
And this could be every single
node and every single edge,
or it can be every edge,
but in our implementation
we just look at the nodes.
I'm basing, based on
the score for each node
with respect to the query, we weight,
we give a weighted input
signal onto that node.
So, if we're asking question
about Temple and Bank,
clearly, those stations
would be more relevant
to that question then stations
that didn't have those names,
and we wouldn't expect
them to get the same
input from this Attention.
So, we use this approach to
get a signal into the graph
to prime it and make it
care about the question
that we're dealing with.
And then what we can
do is we can propagate
the information through the
graph with message parsing.
And then the challenge
we have is how do we read
the answer out of the graph.
So,
the node that is the
answer to the question
or pull out the number, and
that is the ask the question,
or the name of that
station, whatever it is.
And we can use Attention for that as well.
And so we use Attention
to read out of the graph
by taking the query again
and trying to figure out
which node in the graph
contains the information
that's relevant to answering the query.
So we put all of that together,
and this is the architecture
called MAC graph, there's.
[Karin] Andrew, you cut out
just right when you were saying
that this is called MAC graph,
so you can repeat yourself again.
- [Andy] Yeah, say, and
we call this MAC graph.
And this is the really high-level
view of the architecture.
And there's a lot more
information about one
on GitHub repository, like
I said, all of our work
is in stores and
but from a 20,000 feet,
we're using Attention to send
the signal into the graph,
message passing to let the graph
using functions, neural network functions
that learn over time how
they're supposed to operate.
And then we're using
Attention to read information
out of the graph again.
And that's the architecture
that we're using
to get those results on that
range of question-answer tasks
on unseen graphs.
So, what we've been able
to achieve is a model
that we can train, neural
network model we can train
with gradient descent that learn
how to navigate a particular graph schema,
and with that graph schema,
learn multiple algorithms
to answer multiple different questions.
And we're doing it with
end-to-end training.
So, we're training it on what
we want to be able to do,
the exact task.
And that training looks like, you know,
question, answer, and the graph
that we want you to do it.
And we're trying that we
can take a single model
and architecture and it's able to learn
a range of different graph algorithms
to solve different problems, right.
We've got Dijkstra's algorithm,
we've got reading properties of nodes,
and we've got looking,
we've got first search
and finding who's adjacent
with a given property.
And, we think that that's really promising
and there's a lot more.
Already we've shown that
this approach can definitely
learn graph-specific algorithms.
And, yeah, we hope to
continue researching this
and using this approach that we've done,
which is building synthetic datasets
and using those synthetic datasets
to understand neural
network and how it works
and to improve the performance on path
that we should be able to win.
So, that's what we are today at Octavian
and I invite anyone who's interested
to come and participate.
You can email us, you can tweet us,
and you can check our
repository on GitHub.
And we're always interested
to get more people
involved in this research,
and to find new problems
that we think we might
be able to apply it to.
So, with that I'll hand it over
to Mark, I think.
- Oh, yes.
- [Andy] Any questions?
- [Karin] We do have some
questions that came in.
Mark Kristoff asked if
there are any recommended
learning resources to help someone better
understand the material.
And we did,
David Mac did already respond in the chat
saying about the Octavian blog
and the deep learning book,
but if you have any
other suggestions there,
maybe on like one of some of
your journey of exploration.
- [Andy] Yeah, and so, let me switch.
Yeah,
so,
reading some of those
articles is really great,
particularly, you know, I'm
big fan of DeepMind paper,
there's a thing called GraphSAGE,
a network group at Stanford
who have some really good
articles and papers.
That's really how all of
my journeys come through,
as well as reading that material,
giving yourself really
simple problems to do.
Can you, you know, can
you just train something
that can take a list of
numbers, for example,
and pick a number out of
that list using Attention.
You can build very simple problems
and apply the same techniques to them.
- [Karin] Nice, OK,
We also have another
question from Victor Lee.
He asked, is that so in general,
in the general computational model,
in each iteration every
edge and every vertex
is getting updated.
Question mark.
Is that really practical on Neo4j?
- So, that's how the model
works, is that we transform
the whole graph multiple times.
And we're not using Neo4j to do that.
We're doing that every time
in, we're using TensorFlow.
Weather or not it's
practical doing it in Neo4j,
if we were running inside the JVM,
then I think it would be practical.
Out so if we're running inside the Neo4j,
my belief is it would be,
from outside of Neo4j,
it may be a little harder,
but it depends on the speed
and parallelism that we can
support to, potentially,
it's a super powerful algorithms.
- [Karin] Oh, you cut out a
little bit towards the end.
- Oh, and, potentially, it's
a really parallelizable,
so you're just sending a lot of updates
and, you know, concurrent
updates into the graph
and concurrent reads, and that's not a,
like that's a workload that
isn't that sounding for Neo4j.
- [Karin] Awesome.
So, if anybody has any other questions
after this hangout is over, you can ask
in our Neo4j community
site there is a link to it
in the description of the YouTube.
You're also able to post
there, if you have an idea
of something that you wanna talk about
for the next online meetups
under Projects or Content categories,
you know, if it's projects
and you can put it under
a Project category.
And also is there anything else Andrew
that you wanna mention before
the people can, you know,
aside from the Octavian blog,
if you have anything else
that might be good or?
- No, I mean, aside
from the Neo4j community
on the Octavian blog,
there's a great place
to ask questions.
- [Karin] We do have another
question that came in
and it's interesting.
Robert Shinusu, I'm sorry
if I mispronounced that,
he asked if you're using
Neo4j to train tomorrow
and train the model?
And Ato Atom asked
how multiple levels...
I mean, do you actually see the chat,
'cause you might actually
see these questions too, so,
maybe there's something here
that you would be interested in.
- I've then answer to
- OK.
- But, yeah, so we are
not using Neo4j to train.
This stuff we're doing,
we are experimenting
with altering and
adjusting neural networks
and changing all kinds of aspects of that,
and so we're working in
memory and TensorFlow.
But we do use Neo4j
for storing and sharing
and transforming graphs.
And we hope, you know,
to be able to bring this
in a different queue,
when we're not working
on synthetic problem, so,
if we're working on real world
data that's transactional
then being stored and has some importance
then I expect that we gonna need to deal
with databases much more.
- [Karin] A lot of people are saying
that they really like to talk,
if everybody watching, if
you really enjoy Andy's talk,
please, put a thumbs up on the talk.
And we do have some other
questions, somebody,
Santiago Gonzalez asked, have you applied
the techniques you explained here
to compute subgraphs similarities?
- Can you say it again, Karin?
- [Karin] He said, have you applied
the techniques you
explained here to compute
subgraphs similarities.
OK, I think the, I think the
sound cut out a little bit.
So, if you can hear me.
- I'll see if I can find
the chat and then
I'll see if I can find the chat
under apply and text,
but I think the Internet gods are...
- [Karin] Yeah, I mean,
if anybody has, again,
if anybody has any other
questions that you wanna ask,
you can definitely go
to our community site.
Andy actually posted a thread,
and you can talk directly to him.
You can ask questions there,
and it'll be good and
valuable for other people,
and even after the fact.
We also have a talk next week
on Wednesday, November 21st
at the same time as this one.
That's gonna be on
similarity graph algorithms.
So that should be an interesting one.
Yeah, so, I think that that,
I think we can probably
call it end of show.
Thank you so much, Andy
for taking the time out
and showing us this.
I hope everybody enjoyed
it and, hopefully,
we'll see you guys next week
- You're welcome.
- Bye.
