Okay.
So first, let me talk about myself for just
a second.
I'm a current PhD student in natural language
processing at the University of Edinburgh
in Scotland, which has its own strike going
on right now.
Solidarity!
(applause)
I'm currently in training at Google.
Temporarily.
In New York.
I dictate all my code, because due to a disability,
it's hard for me to type much.
My roller derby name is Gaussian Retribution,
and my Twitter handle is @nsaphra.
I just got a tonsillectomy a few days ago.
So usually my voice sounds authoritative but
alluring, like a sultry James Earl Jones but
today I sound like a Muppet.
Alright.
So there are a lot of kinds of deep neural
networks, right?
So the simplest kind, the first thing I'll
introduce you to is a feed forward neural
network.
So you have an input x, that is some vector,
that goes into some function that involves
some matrix multiplication followed by some
non-linear thing... output from that goes
into some other function, that's a different module,
and eventually you get generally like a prediction
vector,  ŷ.
Erm.
There are other kinds, like recurrent.
Recurrent networks are like...
You can iterate over a bunch of items in a
sequence.
Because the same module gets repeatedly applied
to each item in that sequence.
Okay.
They can work in parallel.
They have like...
Generally x is not actually gonna be a vector.
It's actually gonna be like a matrix because...
it's processing a bunch of inputs in parallel.
So.
Erm, generally you have your forward pass,
which is what happens during inference, when
you're trying to produce output.
So it goes in.
It goes up the computation graph.
And produces your output.
And then when you're training, you're taking
the derivative of the error, and you're passing
it back through each module, so you're training
the weights of the matrices that are inside
those modules.
Down to the bottom.
So.
I don't really get what's happening, you know,
during my training or inference time.
Like, while it's running, right?
I want to see what's happening with the representations.
Maybe I want to save some heat maps of the
different activations.
Maybe I want to look at the concentration
of gradients.
Which is something where like it indicates
whether your network is kind of memorizing
as opposed to learning a general function.
Maybe I just want to see the magnitude of
the error at each module.
But all I have is inputs and some PyTorch
modules that are trained or not.
So...
What am I gonna do?
Hooks!
Yeah!
So what happens with a hook is it's a function
that you associate with a particular module.
So that when the actual function of that module
gets run, it simultaneously passes its inputs
and outputs to the hook.
So this is what would happen during a forward
pass.
With a forward hook... right?
And this is what would happen during a backward
pass with a backward hook.
So...
You can...
So in this case, the gradients or derivatives
are actually getting passed.
Both the gradient at the input and the gradient
at the output, are both getting passed to the hook.
So it gets both inputs.
But that's actually a lot of, basically, raw
data that the hook is getting passed, and
the type of that data is just like...
Ugh the matrix!
Uh, tensor that's got this dimension and this
dimension and this dimension!
And you have no idea what any of them are
doing.
Usually.
So I have a little trick that I use, which
is that I set every single hyperparameter
associated with some dimension to a prime
number.
So even if two of them get collapsed, if you've
set something to 3 and something to 7, and
then you get some kind of input, that's, like,
dimension 21, you know where that number came
from!
You can just reshape it accordingly.
So in this case, you can tell where each of
the...
Like, this is like -- I'm just using a dummy
hook.
The dummy hook that I wrote just prints out
the actual...
You know, information about the input and
the output types.
And you can tell what hyperparameters are
associated with what types.
Erm, but then it's like... uh! I've got that,
like, transformer model from the beginning.
There's so many modules.
And am I really gonna have to add a hook for
every single one of them?
Erm, no.
You can just add them recursively, actually.
You've got this little function named_children(),
so named_children() is just going to find
you all the child modules of a particular
module, and you can just recursively go through
the entire computation graph like that.
And that's what I like to do, because I don't
like to manually associate a hook with each
module.
All right.
I'm gonna just end by talking about the grossest
thing I've ever done with a hook.
Which is... that... uh, there's this model,
long short-term memory networks, LSTMs...
PyTorch makes it completely opaque what's going on
inside, but I really wanted to look at what
was happening in the gating mechanisms, but
I didn't actually trust myself to write an
LSTM module that was definitely going to give
the same outputs as the real one.
So I just wrote, like, an LSTM module that
ran at the same time as the real LSTM module,
and just asserted that the outputs were the
same.
And that's the most disgusting thing I've
ever done with hooks.
That's my confession.
And actually, that's it!
Ha ha!
(applause)
