PRESENTER: So today
we have Kelsey Allen
talking about graph nets.
Kelsey is a grad student
in the Tenenbaum lab.
And so I'll pass it off to her.
KELSEY ALLEN: Hey, everybody.
Thanks for coming
out this morning.
As Jenelle mentioned, I'm a grad
student with Josh Tenenbaum.
And I'm going to
be talking today
about work that I did while
I was an intern at DeepMind
on a suite of approaches
that we call graph networks.
The talk is going to be about--
how do I get to
the-- there we go.
I'm going to give just about a
30 minute introduction to graph
networks, including an
introduction to what they
are generally,
talking specifically
about what graph networks
are mathematically
and how they're
applied, and then
giving a specific
example that I worked on
while I was at DeepMind on
using graph networks to do
physical reasoning for different
kinds of construction tasks.
And then the rest of the
time of this tutorial,
which I think is going
to be the most useful,
is actually getting
some hands on experience
with the graph nets library
the DeepMind released,
which is all in tensorflow.
So hopefully, those of
you who have laptops
can sort of walk
through this, and it
should be very useful, I think,
to go through it together.
And then at the very end, we
could always have a discussion.
I'm very excited about
talking to people
about ways in which
you might want
to use graph networks for
your own work and also what
graph networks, what
kinds of extensions
we might think about
for future work.
So just to give an introduction,
so when I was at DeepMind,
I was working in the
cognitive science group.
And so a lot of the
work that we did
is influenced by thinking about
what kinds of things humans
need to be able to
reason about in order
to interact with the world.
So in most of our
daily experience,
we interact with different
kinds of very structured scenes.
So on the left, you
have a structured tower
that's made of
different blocks that
can be connected to make even
taller towers, which you can
then play with as an infant.
And on the right, we often think
about graphs of communities,
like all of our social
media connections
or inside our families
who's connected to who,
inside this department
who's connected to.
And this is just to give two
different examples of sort
of graph structured
kinds of reasoning
and two very different
kinds of domains.
And so we would
like approaches that
can easily handle these
kinds of structures
that we see all the time.
To give some other examples,
computationally and
empirically, of what
people have worked
on in the realm of
cognitive science
that is related to some of the
graph network stuff we've done,
early on in the late
1990s, Dedre Gentner
has done really
fabulous work looking
at how people do analogical
reasoning using relationships.
There's been a whole lot of
work on hierarchical planning
and composing different kinds of
routines, which you could also
think as a sort of tree shaped
graph for how you might combine
lower level primitives into
higher level plans as well
as things like discovering
relational structure
like Charles Kemp and Josh
Tenenbaum's classic work
on trying to discover
graphs of relationships
between different
kinds of, for example,
animals or other kinds
of semantic information.
To go back and look at
sort of the history of AI,
early on many classic
approaches were
focused on discovering and
using structure for reasoning.
So things like logic, grammars,
graphical models, et cetera
are just some examples.
And these were really
critical when data was sparse,
because using structure
affords us very strong
inductive biases, which make
learning from sparse data
reasonably tractable.
And so early on, these
connectionist models
that have now become
very popular often
failed because they didn't
have these strong biases which
allowed them to learn
effectively from sparse data.
They required more data to
make up for the lack of biases
in order to try to learn
that structure, which
they didn't have at that time.
However, when data
became more available,
it became clear that some of
our structural assumptions
that we were using in these
early models were incorrect.
And so at this point,
and as we all know now,
these connections
and approaches that
are now termed
broadly deep learning
have been employed
to great effect
in all kinds of
different scenarios.
And in particular, I think the
most compelling way it's been
applied has been in vision.
And I think the reason for
that is because we don't really
have a good understanding of the
underlying structure in vision
that is in some sense correct.
And so learning the structure
from massive amounts of data
has proven to be
substantially more effective.
However, now we've taken
again a turn in deep learning
in thinking about how we
can bring structure back
to these kinds of approaches
and more meaningfully integrate
learning at really large scales
with the kinds of structure
that we expect to
see in the world.
And so deep learning
has now been
used for several different
approaches combining
sort of classical AI
and more recent work.
Things like deep learning
for logical reasoning.
And a lot of the work the
Tenenbaum Lab has done
is in using deep networks
for initializing inference
or for learning proposals
for graphical models.
And so in this talk,
I'm going to be
talking about a
similar sort of theme
but particularly in
encoding deep networks
with graph structure data.
So why might we actually want
to use graph structure data?
Well, in a variety
of different kinds
of things we might
care about, graphs
are really popping
up everywhere.
So to just give
some examples that I
think cover a very wide range
of different ways in which you
could think about
the world as graphs.
On the upper left
here, Qui et al.
used graph networks to try
to predict traffic flow.
So you could imagine a
city map as a graph where
you have different
nodes as being the hubs
that people are moving
between and the roads
as the edges in that graph
which you're trying to predict
the congestion of, for example.
You could also consider organic
chemistry to be a kind of graph
where a molecule is described
as a graph where each atom is
a node and the bonds
connecting those atoms are
the edges within that graph.
And you could use that graph
structured representation
to try to predict, for
example, the energy
or the fundamental frequency
of that resulting molecule.
On the upper right, you
can use graph networks
to describe the
bodies of agents.
You can describe the body
of a particular agent
as a graph in that each limb
is a node within that graph
and the joints connecting
those limbs are at the edges.
And you can do things
like try to predict
how that body will move when
a given limb or joint is
perturbed.
And then the work that
I was mostly focused on
in my internship was on using
graphs for physical prediction.
So you can imagine
in a physical scene
that each object
is a different node
in that graph and
the edges connecting
those objects are the forces.
So things like gravity or
elasticity or anything else.
And you can then use
that to try to predict
the motion of objects in
different physical scenes.
So to give even just a couple
more examples on the bottom
two rows, you can also think
about tree structured data
as being a kind of graph.
So you could use
graph networks to try
to do things like, for
example, semantic parsing,
since a parse tree is just
another kind of graph.
You can also not
make any commitment
to where the objects
are and instead
just break an image into
different grid cells that
could be fully connected,
and that could also
be seen as a graph.
So in all of these cases,
we can think about--
I'm going to use the term
entities and nodes somewhat
into interchangeably
and relations and edges.
So the relations
are the connections
between the entities.
To interpret some of our
standard deep learning tools
with graph
terminology I think is
somewhat informative in
thinking about the kinds
of inductive biases the
different layers afford us.
So at the weakest level, we
can imagine a fully connected
layer, like the one on the left.
And here the entities
are just the units
in the neural network.
So each of these different
blocks is an entity,
and the relations are
all to all connections.
And so the inductive
bias afforded by this
is relatively weak
because we haven't
assumed any inherent
structure in that setup.
Making it a little
bit more structured,
we could use things like
convolutional layers
where now the
individual entities
are the grid elements
in your image
and the relations are local.
So this affords an
inductive bias of locality
and now makes a convolutional
layer spatially invariant.
So you can do any different
kinds of spatial translations,
and it will be robust to that.
Recurrence is another kind of--
you could imagine that
as another form of graph
where now you have entities
as being the time steps
and the relations
are sequential.
So time step 0 is
connected to time step 1.
And so this gives you the
inductive bias of sequentiality
and the invariance
of time translation.
And finally, graph
networks are in some sense
just an extension of
these various ideas.
But now we'll talk about
general nodes as the entities,
general edges as the relations,
and the inductive biases
that you can now
get are arbitrary
and depend on the
structure of your graph.
And the invariance you get are
the node and edge permutations.
To go into a little bit
more intuition on that
and really hammer home this
idea of being invariant
into the order of
the entities, I
want you to consider
this solar system.
So we have a sun in the center
and a bunch of planets orbiting
that might each have a moon.
And we want to, for
example, predict the center
of mass of the system.
So critically, the order in
which we consider the planets
should not matter.
It shouldn't matter if
I represent this scene
as being this
feature concatenated
with this feature concatenated
with this feature instead
of this one and then this
one and then this one.
But if we were
trying to represent
this scene in a
classic sense, we
would have to commit to some
ordering of these entities
in order to use a standard
deep learning kind of approach.
And so a standard, for example,
multi-layer perceptron training
on the features of the entire
scene won't be order invariant.
And so instead what we
would like, as in physics,
is to apply the same
function to all the objects
and interactions in the
scene and then aggregate
that information to
make predictions.
And so this is really at the
core of what graph networks do.
In order to get into
a little bit more
of the mathematical details
of how these things work,
I'm going to use this
general graph definition.
So actually, before
I go any further,
does anyone have any
questions so far?
Yeah?
AUDIENCE: Can you just
rehash the explanation of
why the multilayer
perceptron has the order--
KELSEY ALLEN: Has
this order effect?
OK.
Yeah.
Maybe I can draw it.
So imagine we're going to
call this 1, 2, and 3 and 4
as our entities.
And I'll forget about
the moons for now.
And so we'll have--
I guess I can't
easily write anywhere.
AUDIENCE: Do you want
that board out there?
KELSEY ALLEN: Maybe.
AUDIENCE: [INAUDIBLE]
KELSEY ALLEN: I'll try to
explain at a high level
while they're getting the board.
So if you wanted to apply
a standard deep learning
kind of approach, you need
to represent this scene
as some vector, as a single
vector representation.
And so in order to do that--
thank you.
Yeah.
Oh, green is terrible.
So imagine this has some
position that's like--
actually I'll just call this
position one, position two,
position three, position four.
So if you wanted to
predict the center of mass
of this whole
system, then you need
to represent the
scene in some way.
So the standard
thing to do would
be to concatenate your
features and then run this
through some network and
then predict center of mass.
Does this make sense?
But you should equally be
able to do this and predict
that same center of mass.
But this feature
representation is
completely different from this.
And so if you were
to train it such
that you always flip the
order of these features,
you could learn to
become invariant
to the order in which
you present them,
but it's not inherent
to the network.
So what we would like
is to, for example,
be able to not have
this issue where
we have to know that we, for
example, need to present order
invariant switching
of our features
and always get the
same center of mass.
Does that make intuitive sense?
Yeah?
AUDIENCE: So it's actually
about the trained network
and not necessarily
about the order that you
choose at the beginning?
KELSEY ALLEN: Can you explain
what you mean by that?
AUDIENCE: So you could
make a network that
works on any of these
features, but then you
can't switch it
after you've already
trained it to work on
a new set of features.
KELSEY ALLEN: Exactly.
So you want to have
something that can always
be trained where you don't need
to know the order in which you
will show the features to
that network in the future.
So if I also, for
example, always
knew that I was going
to have this object
in this first position,
that would also
be fine for a classic approach.
Does that make sense to people?
OK.
AUDIENCE: I guess
what's not clear
to me is why it's a
problem to know that--
to fix the order in
which you put things?
AUDIENCE: The thing is, the
way these things will work,
I guess, are just [INAUDIBLE].
So just allow a class of
functions that are invariant.
KELSEY ALLEN: That's right.
Yeah.
AUDIENCE: The permutation or
whatever structure there is.
So it's more of an
example of where--
this is an example
of a problem where
you might kind of
get an advantage out
of this invariance
under permutations.
So I guess--
KELSEY ALLEN: Yeah.
Another thing, by the way,
that the graph networks
can do that an MLP could
not as add another guy here.
So this is maybe more clear.
But if I wanted to add a
fifth object to this system
and still compute
the center of mass,
I now don't have the same
sized feature vectors.
So I can't use the
same network at all.
Does that make sense to people?
OK.
All right, so to get more
into the details of that,
we're going to use this
general graph definition.
So on the left there's a graph.
And I'm going to say
that the graph has
nodes v. vi is
going to be node i
with different
kinds of attributes.
Edge k is going to
denote a particular edge
in the graph which is related to
a sender node sk and a receiver
node rk and then a u,
which is a global graph
variable with attributes.
So each of these things
has a feature vector
which is the attributes of
that object in the graph.
So for example, in the
solar system example,
the nodes would be potentially
the positions and the masses
have those as the attributes.
The edges could be something
like gravity or something
like that as the attributes
of those examples.
To give you just an example
when we're walking through this,
I'm going to use this
mass-spring system where
I'm assuming that
the nodes vi are
the masses in the
mass-spring system
and the attributes could
be the mass, position,
and velocity of those objects.
The edges ek are
going to represent
the possible interactions
between the masses.
In this case, springs.
And every edge is directed.
So in this graph, you would
have four edges, one from node 1
to node 2, one from node 2
to node 1, one from node 2
to node 3, and one
from node 3 to node 2.
And then the global
properties u of this graph
could, for example,
be the total energy
of this mass-spring system.
So now to get suddenly-- yep.
AUDIENCE: Sorry, this might be
a little bit [INAUDIBLE] but are
graph networks typically
always direct--
the edges are directional?
KELSEY ALLEN: Yeah.
AUDIENCE: OK.
Cool.
So you would usually
add four as opposed
to two edges for
the two springs.
[INAUDIBLE]
KELSEY ALLEN: Yes.
Yeah.
You could potentially
change that
by having different kinds of
aggregation functions, which
I'll talk about in a second.
But generally, it's
easiest to just describe it
as two different edges.
So this is an entire
graph network block.
And what's really critical
is that the graph network
takes as input a graph.
So it takes a set of
nodes, a set of edges,
and a global property
and it outputs a graph.
So it outputs an updated set of
nodes, an updated set of edges,
and an updated set
of global properties.
And when I say updated, I
mean that the attributes
for the nodes, edges, and
global properties will change,
but the structure of the
graph will remain constant.
So that's really
critical, and I'll
talk about that a little
bit more in a bit.
The six functions that we have
in a standard graph network
block are these three
learnable functions, phi
e, which applies to
the edges of a graph,
phi v, which applies to
the nodes of a graph,
and phi u, which is going to
update the global properties
of the graph.
We also have these
aggregation functions,
which are going to allow us to
take the structure of the graph
into account when updating
these properties, which
are an aggregation function that
goes from edges to the nodes,
an aggregation from the
edges to the globals,
and an aggregation from
the nodes to the globals.
And on the next slide,
I'm going to walk
through how each of these
operate in a particular graph.
Any questions about this
before I go to the next slide?
OK.
All right, so I'm going
to talk about just
one particular graph update
order for one particular graph.
The first step that
we're going to do
is apply this edge function
to the property of the edge
attribute, the node
attribute, which
is the receiver
node for that edge,
and the node attribute, which is
the sender node for that edge,
and the global properties.
So in the mass-spring
system, this
could be something
like computing
the forces in the graph for
each different interaction edge.
So like the spring constant
or an updated version
of that, which will allow you
to propagate that information.
So that gives us a
set of values ek prime
or updated edges for
each edge in the graph.
Then we're going to apply the
aggregation function, which
takes the edges and
computes the set of ei prime
with this hat on top of
it, which is going to,
in this case, sum all of the
inputs to a particular node.
So when we apply that
aggregation function
for this node, for example,
what we're going to do
is sum up these different
edges for that node.
And that's what our
aggregation function is doing.
AUDIENCE: Sorry.
KELSEY ALLEN: Yeah.
AUDIENCE: So in the
first step is the--
so this phi e.
KELSEY ALLEN: Yeah.
AUDIENCE: So that would
be kind of start node,
end nodes, then you
look up in the u
it could be the position
and then computing the mass
from that and take the
attribute of the edge to be--
KELSEY ALLEN: So
the u is actually
just some global property.
So often we think about that
like gravity or something
that's affecting
the entire graph.
So it's not really looking
up anything within the u.
It's perhaps even
easier to not even think
about the u at the moment.
But in order to
update the edge, we
take the edge's current value
as well as the nodes that
are connecting-- that
that edge is connecting
and compute the
updated edge value.
AUDIENCE: I'm just
wondering where the--
so the position of the
objects in your system--
KELSEY ALLEN:
That's coming next.
Yeah, so once we've
updated the edges, we then
update the nodes.
AUDIENCE: So in your
paradigm that we
are talking about
the forces, we don't
need previous values of edges
to compute the new ones.
We just, in principle, need
the position or the attributes
of the nodes, right?
KELSEY ALLEN: That's right.
So in this case, we would be--
the simplest thing
that we could learn
would hopefully be
just, yes, that.
AUDIENCE: In general, the update
may use the previous edges.
It's just that in this
case, we don't need them.
KELSEY ALLEN: Yes, right.
AUDIENCE: This way.
KELSEY ALLEN: Yeah.
So critically, that
node rk and node sk
are not the indices
of the nodes.
They're the actual
node attributes.
So the positions of the nodes
that that edge is connecting.
So once we have
that representation,
we then apply our node
function to the aggregated set
of edges for that node as well
as the node's previous value
and the global attribute.
So now we have updated
node attributes.
And that's something like
computing the new position
velocities and kinetic
energies for a particular node
in the graph.
And finally, if we're using
these global properties,
we would aggregate
all of the edges
and all of the
nodes in the graph
and then apply our
learned global function
to the aggregated edges
and aggregated nodes
in the previous global value
to get an updated global value,
like an updated energy.
Yeah?
AUDIENCE: This is kind
of taking a step back.
But can you remind me which of
these values are being learned
and which of them are
I guess the inputs?
KELSEY ALLEN: So phi u, phi
v, and phi e are learned.
The aggregation functions
are constant and the graph
structure itself is constant.
AUDIENCE: So then all the
arguments of those functions
are--
KELSEY ALLEN: The
k's are all constant.
The actual connectivity
of the graph,
but the attributes of each
of those things in the graph
is changing.
Yeah?
AUDIENCE: So the edge update is
always dependent on rk and sk
attributes?
KELSEY ALLEN: Yes.
AUDIENCE: And the
node is always only
dependent on the incoming edges?
KELSEY ALLEN: Yeah.
And you could define
alternative update schemes here.
The really critical
part is these phi
functions are only applying
to-- are applying equally
to all nodes in the graph
or all edges in the graph.
Yeah?
AUDIENCE: Is it possible-- maybe
I'm looking a little bit ahead.
But is it possible to have a
dynamic kind of graph instead
of having to replace the graph?
KELSEY ALLEN: So
dynamic in what sense?
AUDIENCE: Maybe the
model will tell you
for this kind of thing--
maybe you need a new node
to explain this thing.
KELSEY ALLEN: Yeah, so you
would need a separate learning
scheme for that.
There's nothing that says you
can't change the structure
from one iteration to the
next, but this will not predict
what that structure should be.
Yeah?
AUDIENCE: Just have
a naive question.
Because I saw there is two--
one Pearson node that
has two connect--
actually--
KELSEY ALLEN: This one?
AUDIENCE: Yeah, yeah.
Just wondering, can two nodes
have two edges or more edges?
KELSEY ALLEN: Yeah, there's no
limit on the number of edges.
These could each have a
different edge attribute,
for example, and then
they could encode
different kinds of connections.
Like if you had two
different springs connecting
the same nodes with
different spring constants,
then you might want
two different edges
between those same two nodes.
AUDIENCE: Follow up question.
So you mean you have to train
a new model for that other kind
of graph if you have
a new node or you
have a couple of new nodes--
KELSEY ALLEN: No.
No, no, no.
The model that you're
learning does not
care about the graph structure.
Because each of these
functions is not
dependent on the particular
values of rk and sk.
They take the node attributes
from rk and sk as input,
but the structure
of the graph just
determines the propagation
and the aggregation functions,
which are independent of the--
in some sense, independent
of the structure.
So you can learn--
when you learn a
graph network, you
can apply it to any
structured graph.
AUDIENCE: But how
did the performance
change in that case?
KELSEY ALLEN: In practice,
you look at that.
I will show some examples
of having, for example, you
can train on towers with
different numbers of blocks.
Yeah?
AUDIENCE: Well, I
don't know if you're
going to give an example.
But previously you said
there's one example where
you can use those to
train on some image
and basically use that as
point [INAUDIBLE] in graph.
But now is it that
between one pair of nodes
you can have arbitrary
number of edges?
And so in those cases
where there isn't actually
a theory after how
many edges there is,
like how do you determine
what kind of graph?
Because now you can
have arbitrary number
so the fully connected can
be all kinds of different
fully connected.
KELSEY ALLEN: Yeah.
So in general, in
practice what I have seen
is that people just assume
there's one edge between two
nodes in each direction.
But you could play
around with that.
It's, again, not
something that this model
will be able to predict
for you unless we
can talk about some ways of
extending that at the end.
But there's no obvious way
of doing that with this.
Yes?
AUDIENCE: And I guess
next [INAUDIBLE] to this,
the graphs that you're treating
here cannot be expressed
as simply a matrix with nodes
and the values of the elements
being the edges.
It's more general
here because you
can have more edges with
the same destination.
KELSEY ALLEN: Yeah, yeah, yeah.
So this is more general than
just an adjacency matrix,
for example.
But you could convert an
adjacency matrix into a graph.
Yeah?
AUDIENCE: Another question.
So is ek here learnable or not?
KELSEY ALLEN: So ek is the
input edge representation.
So to go back two
slides, the edges--
actually, one more.
The edges ek are going to
start with some attributes.
And what you're learning
is a transformation
on these attributes.
AUDIENCE: So these
attributes are interpretable?
KELSEY ALLEN: So
when you put them in,
they will be interpretable.
When you then run
your network, it
could embed this in some
high dimensional space,
and it could become
uninterpretable but useful.
And then you can use things
like your standard deep learning
visualization tools
to try to figure out
what those high dimensional
vectors are representing.
AUDIENCE: The same
thing for v, i, and u.
So they are initialized
as interpretable vectors.
KELSEY ALLEN: And
then we project them
to high dimensional
magic deep learning space
and then something happens
in some-- yeah, right.
All right.
So that really is the core of
graph networks, that one slide.
So now that we've developed a
particular graph network block,
we can actually compose them
in all different kinds of ways,
because each graph network
block takes as input a graph
and outputs a graph.
And so you can connect them
in somewhat arbitrary ways.
So for example, one of the
classic things that we use
and the thing that I
use for my internship
and these encode process
decode models where you're
going to take your input graph,
encode those nodes, edges,
and globals into some high
dimensional representation,
and then run multiple steps
of graph net processing
and finally decode
to something you
can then understand, like
the new positions of node.
AUDIENCE: So that means
in practice, you have,
I guess, two sets
or three sets of--
KELSEY ALLEN: Of
learnable functions.
Yeah.
AUDIENCE: Takes to a
high dimensional space,
one that precedes in that high
dimensional space, and another
that--
KELSEY ALLEN: Exactly.
Yes.
So to give you some
intuition for why
we might want multiple
steps of processing,
when you define a
graph, if you take just
one step of propagation,
then for example, to get
this information from this
node to the rest of the graph,
after one step,
you're just going
to be able to propagate it
to its direct neighbors.
After two steps, you'll
be able to propagate it
to all the neighbors.
And after three
steps, you'll also
be able to propagate
it to all the edges.
But here's an example where
that doesn't quite happen.
So at the first
step, you're only
able to propagate information
to this one other node.
And even after three
steps, you actually
only reach three
of the other nodes.
So if it's important that
the information from one
part of your graph gets to
all the rest of your graph,
you might need to take
multiple steps of propagation.
So here is just-- yep?
AUDIENCE: Go back again.
OK.
Yeah.
KELSEY ALLEN: So here's just
a few different examples
of graph network blocks
that are different versions
of the full block.
So in the simplest
case, you can imagine
an independent recurrent block,
which is not actually using
the graph structure at all.
It's just assuming that
everything is independent
and then you're going to
update the edges, the nodes,
and the globals independently.
So the graph structure
will never affect anything.
Here's a message
passing neural network,
which was also published
around the same time.
And here they don't use globals
in the input representation,
but they do try to predict
globals from the graph.
So it's a sort of minimal change
from the full graph network
thing.
You can also imagine things
like deep sets which, again,
are just assuming
that we're going
to learn a single or two
functions on the nodes
and the global
properties of the graph
but not assume any
connections between them.
And so all of these things are
representable in the graph nets
library that we're
going to go through.
We'll talk about
them a bit more then.
So how might we actually
use this in a system?
So the graph, we
can define targets
that we might want
over the nodes, edges,
or the globals of a
graph, since we'll
get updated representations
for each of these things.
So node centric
could be something
like trying to read
off the inferred mass
of an object or the
positions and velocities
at the next time step.
Edge centric could be
something like trying
to predict whether or not
two objects are in contact.
And global centric could be
something like predicting
the energy of a system.
And the input graphs could
be pretty much anything.
You could have it being
structured with node attributes
given by known
quantities, and you
could include sparse
connectivity information
or you could include all
to all unstructured graphs
or hierarchical things for tree
to tree learnable networks.
So the biggest limitation is
that the structure of the graph
is typically not learned.
So at no point where we changing
the receivers and the senders
for a given edge.
And also the
structure of the graph
is typically not changed
as we unroll something.
So inside that
recurrent block, we're
not somehow changing
the actual structure
of the graph within that.
And if you wanted to do so, you
would need some other mechanism
to possibly delete
edges or nodes
or add edges or nodes, which
is something people have
been thinking about.
Another just thing I want to say
is that graph networks will not
cover absolutely everything.
They won't handle things
like recursion, control flow,
or conditional iteration.
And for those kinds
of things, you
might want to use program
induction instead.
And so something
to just think about
as we're going through
some of these examples
is how useful is
this approach if we
can't learn the structure?
Because some people would say
that learning the structure
is really the core
of the problem.
So just something to consider.
And if you're curious,
I have some references
of people who are trying
to learn structure.
Do you have a question?
AUDIENCE: Yeah.
I missed the
original definition.
Can you redefine
what an edge is?
KELSEY ALLEN: Yeah.
It's just denoting that
there is a connection
between some sender node
and some receiver node.
And it has a certain
attribute vector,
but this could be
initialized to be empty.
AUDIENCE: OK, so it's
like, it's a particular,
I guess, it takes
as input a node,
applies some function to
it, and then outputs a node?
And I guess the function
that applies to it
is of a fixed form?
KELSEY ALLEN: So there is a
difference between an edge
existing in the
graph and the edge
function that you're learning.
So the function that you're
learning takes in as input
the current edge
representation and then
the node representations for
the nodes it's connecting
and outputs an updated
edge representation.
AUDIENCE: OK.
And is the edge function, does
that have a particular form?
Is that something with a linear
map or something like that?
KELSEY ALLEN: So it's
a learnable network.
So it's a set of
weights and biases.
So that's the part
that's being learned.
AUDIENCE: OK.
And then so I guess
my actual question was
when you say that there are
things that you can't learn
or you can't represent
with a graph network.
That means that the graph
network itself can't
do something like control flow?
KELSEY ALLEN: Yeah, there isn't
a graph representation that
will give you control flow.
It's actually independent
of the network part.
AUDIENCE: And is that--
is that anything that
looks like control flow?
Or is it--
KELSEY ALLEN: I'm not sure.
We should talk more offline.
Yeah.
AUDIENCE: Thanks.
KELSEY ALLEN: I should
also say that there
is one way you could
imagine learning structure
in these graphs, which is
to always assume everything
is fully connected and
give every edge a weight
and try to learn those weights.
But that's computationally slow.
So when I'm talking about
learning the structure,
I'm talking about really sparse
learning of the structure.
