>> Next speaker is Alex Gaunt,
and he will be speaking about something very exciting.
Actually, he partially invented here in
Cambridge, Graph Neural Networks.
>> Thanks very much, Kevin Hewey.
>> Yeah.
>> So, I want to talk a little bit about
Graph Neural Networks and I want to give
just an overview of what's been going on,
Mike's of Cambridge had a lot to
say in Graph Neural Networks.
But I just want to give a very gentle talk introducing
the idea of the graph neural network and
the history behind it and give a couple of applications.
So, hopefully this is a nice talk to everyone
because it won't be too taxing.
So, one big objective I also have is to
advertise the fact that we have
released very efficient implementations
of Graph Neural Networks on GitHub.
So, there's the link there,
and people have already said our code is intense,
so that people have already started to
convert it to Python and
things for your favorite frameworks.
If you want to try out some of the things,
I discuss to tell you how to get yourself over there.
So, what's the problem we're interested in.
Suppose you have a dataset that contains many graphs,
and each of these graphs has some sort of label
maybe a real value, or a classification,
or something more sophisticated,
and you want to design some sort
of machine learning function that
can take in a graph as an input,
and return you the label as the output.
So, to give you a concrete example,
maybe you have some molecules
that truly representative graphs,
and these molecules will have some properties
maybe an interesting property
would be how likely that molecule is a drug.
And you want to build a function that's able to
take new molecules and tell
you whether they are likely to be a struggle.
How are we going to approach this,
the starting point is going to be
recurrent neural networks is
a good place to start, and very successful.
Recovery neural networks basically,
operate on a very special type of graph, the chain graph.
So, if you have some text,
then it's a sequence of tokens,
all connected to each other by these links.
And we represent chain graphs using recurrent units,
we replace each node with the recurring unit
which I represent with triangles,
and we link them together with arrows.
And then we proceed by embedding each token in sequence.
So, there's some sort of words,
stored in these nodes
and we embed them and I'm going to represent
the embeddings as an envelope
to sort of invoke the idea of a message.
And each node gets
these envelopes and then we do our usual thing of
just running recurrent neural network forwards
using some sort of recurrence relation which given,
current state and the New Message gives us a new state.
So, hopefully this is all very very familiar,
and really the purpose of this slide was to
introduce this graphical notation.
Because what we're going to do now is we're going to go
to graph structure data
so we start adding edges to this chain,
then we get a general graph.
And the important thing about graphs
is you can represent them as a nice adjacency matrix.
But there's a fundamental symmetry that graphs have
that you know you can draw them
in many different ways and
they all represent the same thing.
You can commute the order of
the vertices and you get a different adjacency matrix.
But fundamentally, all of
these representations on the slide
represent exactly the same graph.
And so any model that we construct for
analyzing graphs has to be
invariant under these symmetries.
So, how are we going to adapt the RNN frame graphs?
So, here's a graph
is provocatively shaped like a molecule,
but it could be any graph.
And we're going to start just as we did before,
by taking some features of the nodes,
so maybe this node is a carbon atom
and a feature vector is just,
it's the atom hydrogen, is the atom carbon,
one in that slot is the atom of
fluorine whenever we put zeros everywhere else.
So, we can put some features onto the nodes and
we store them in the state
of the nodes and that's it, a gray envelope.
And then, we will do that for all the nodes,
they all get their features,
and then we associate a neural network
with every edge of a specific types.
So, maybe we have single bonds
represented by these green edges,
and double bonds represented by yellow edges,
but in general you might have
many different edge types of like that.
Knowledge base would have different
types of relationships and so on.
So, that's the basic idea
and then we're going to replace all
of the nodes with recurrent units, these triangles.
And the message passing is going to proceed as follows.
Imagine you zoom in on this particular node,
that node will pull all of
the messages from his neighbors,
and as the messages are pulled,
they will go through the neural networks on
the edges they have to pass over.
And all nodes at time
T will pull the messages
from time T minus one from their neighbors.
So, all nodes are pulling simultaneously
from the previous timestep.
Once we have collected all the messages,
we perform a sum,
and this sum is invariant to the order of
the neighboring messages so you can commute
the envelopes inside that sum and
the sum hasn't changed. Hey.
>> Just a clarification, the neural networks are
they somehow bidirectional or what is that?
Usually neural networks maps from inputs to
outputs so [inaudible] think about [inaudible]?
>> The neural networks simply maps these messages as
vectors and it match to a vector of the same size.
>> What happens if you send the messages the other way?
Is there a separate neural network or [inaudible]?
>> So, the edges are all directed,
and if you want an undirected edge,
you can add the same edge
backwards and then share the network on both edges.
Okay, so that was a single timestep.
So, we just pull messages from our first order neighbors.
So, you can think about that after this single timestep,
each node basically knows about its own information,
and the information from nodes of distance one away.
And then we just repeat this over and over again.
So, after the second timestep,
each node about knows about
its first order and second order neighbors,
and we can just keep going and we
stop after a fixed number of timesteps,
and that fixed number in
our particular variety of graph neural networks.
That fixed number is a hyper parameter.
So, you decide how many steps you're going to propagate.
So, sort of the radius of
a fuel smooshing around
the information, before you start.
And once you've
finished going round, and round, and round,
you have representations stored on all the nodes that
somehow have collected information
from the local environment of the graph,
and then you just collect them all up and perform a sum,
and again this sum is permutation variant, so.
And then you have
a representation of your graph which you're going to feed
to whatever higher layers that you
want to perform your action on.
So, that was a very Microsoft centric view
of what the Graph Neural Network is.
And it summarizes the code that's in the GitHub,
but Graph Neural Networks,
like all good ideas,
were invented a very, very long time ago.
At least in deep learning standards.
And in this work,
we have this type of parameter that says,
you can unroll for a certain number of steps.
In the original work, they said
we're going to just unroll forever,
and we're just going to keep going until we
reach a fixed point, and then we'll stop.
And I can show you that if you do that,
you have to put
some constraints on the forms of the neural networks,
of the recurrent units,
and those constraints mean that
actually even though you're doing all of this unrolling,
the influence of one node on any other node,
exponentially decays with the distance between the nodes.
And so, it's actually not a very good way
of sending information around the graph.
And so, this was updated at Microsoft Cambridge to use
Gated Recurrent Units or LSTMs as
a means of transferring longer
range information across the graph.
And so, in parallel,
the story I've told you has
been approaching the whole idea of graphs from,
starting from, Recurrent Neural
Networks, and in parallel,
people were thinking about convolutions on graphs.
And so, you can
perform a convolution very efficiently in Fourier space.
And the Fourier transform of the graph
is a well-defined operation,
and so people came up with
the concept of convolutions on graphs,
and there were a series of approximations.
Performing the Fourier transform
is very conversationally expensive.
So, these guys just did
the expensive computation and then they
made progressive approximations to make it more,
and more efficient until finally,
some guy realized that they made so many approximations,
that actually everything was just the same thing.
That these convolutions had converged on
exactly the same architecture
as the Graph Neural Networks,
and there was a great paper which was
the first proper application of this
to some serious chemistry data,
that was unifying all of these different methods,
and now, the field is exploded.
So, I just mentioned the Microsoft based papers
here, but there are very many,
many papers appearing on learning or
using graphs in recent conferences.
So, yeah, just once again,
flash this advert up.
All that I've been describing has been
in this blue box labeled 2,
so that's the particular variety
of model I've been describing.
So, what can we do with this?
I've been mentioning chemistry every now and again.
And sure enough, if you put these Graph Neural Networks
inside some other optimization algorithm,
and use these Graph Neural Networks as a scoring system,
then you can come up with molecules that are
very likely to be drugs.
So, this drug likeliness scale goes from node to one
and you can generate these molecules that are
very likely some drugs and have been discovered before.
You have to be careful because you can come up with
adversarial examples that the network
is very confident of drugs,
but actually they are very bad and not drugs at all.
So, there's a lot of work left to actually use these in
realizing this dream of deep learning to
revolutionize the drug industry,
but it is promising that we can very easily
get these very high scoring molecules.
And maybe, another example
there'll be a talk about
this particular example that I clear,
and it comes from Miltiadis Allamanis
and Marc Brockschimdt.
So, the idea here is that,
a program is, you can see it's a list of tokens.
It's just this stream of tokens and you can see it's
a stream of tokens in grey at the bottom there,
so for example Assert.NotNull open brackets,
but you can augment this stream of tokens with
the abstract syntax tree on top of that stream.
And you can have other ideas like connect
positions where variables were last read and written to.
And so you can build
a big graph that represents programs,
and you can ask a Graph Neural Network to
solve problems on programs.
For example, if I gave you
this snippet of code and I asked you to fill in
the blank knowing that the possible type
correct variables to put into this slot clazz or first.
Then it turns out that the developer who was
writing this code had written a clazz,
and you can see
that has come about from a copy and paste problem.
That they should really have written first,
but they've just copied and pasted that snippet of code.
And so, we want to use a Graph Neural Network
to instead look at
those possible candidates first and clazz,
work out a representation for the slot variable,
and work out the representation for
the first and clazz variables in that graph,
and find which representation best matches the slot.
And the Graph Neural Network will point out that,
in this case, the first variable
should be placed in the slot.
And we are able to find bugs in
some real code repositories using this method.
So, just to give you
some idea that Graph Neural Networks
are actually better than some other baselines.
You can do exactly the same variable misuse task
using an RNN to look
at the surrounding context or
an RNN around each use of a variable,
and you get worse performance.
And similarly, rather than selecting from
a set of type-correct variables,
you can use the representation of
the slot to initialize an RNN,
which is going to produce you a string,
which is a good variable name to put in that slot.
And again, the Graph Neural Network
outperforms simple natural language processing baselines.
So, yeah, that was a whistle stop tour
and 17 minutes of the area of Graph Neural Networks.
And if you want to dive in,
then, in that case hop over to the GitHub page.
>> If you have other questions,
in fact we need to fill three minutes
until the pizza is there.
>> I wondered if
you'd like to comment on
what is the theoretical graph results
about graph of isomorphic [inaudible]
some of these tasks.
>> So, we never actually tried
using GNNs just to directly solve the graph.
I suppose isn't a problem.
And really, we've been, we just built in the symmetry,
the permutation symmetry into the model,
that was the priority.
So, we haven't really
thought too deeply about isomorphisms,
but, I mean, if you have to isomorphic graphs,
then what would you hope?
>> So, I guess the question would
be are there limitations
to the investments on that basis.
Maybe there are classes and graphs,
but we wouldn't be able to
tell the difference between them?
>> So, one big issue is that we do this,
I mean, it's not related to isomorphism,
but we do this truncated propagation.
And so if you have
a cluster over here and you're propagating inside
that cluster and then you have a very long
thin connection to another cluster,
then if our propagation
doesn't go far enough along with branch,
then we won't be able to tell the difference between
that graph and a graph with a slightly
longer link or anything like that.
So, yeah. I mean, it's not ideal,
but it's the first step.
>> It seems to work very well, that's
right, better than these other.
>> I think that's because natural graphs are not
pathological in some sense.
>> So, that aggregate operation you do with
the sum of all the resulting messages.
I cannot see that as a pooling operation,
I was just wondering if
other types of pooling would make sense and if.
>> Yeah, I mean that's, we have a hyper parameter.
Sum, min, max.
Yeah, max is a bit unstable,
but sum and min are fine,
you can come up with your favorite and
maybe you'll do better.
>> Are there probabilistic version of these things here
like [inaudible] probabilistic models.
>> I mean at the moment we've done
a very much of clone of RNN style things.
There's no stochastic nest inside our models at all.
Yeah. So, they would be interesting the next time that.
>> The last question or is everybody hungry?
I can smell the pizza outside.
Great. Let's thank Alex again, for a great talk.
