Presenting the LMU: Continuous-time
representation in recurrent neural
networks. This work was a collaboration
between ABR and the Centre for
Theoretical Neuroscience at the
University of Waterloo.
Long short-term memories are arguably
the most successful type of RNN.
They've been applied to many real-world
problems ranging from image caption
generation to music composition. However, they fall apart when tasked with
learning temporal dependencies that span
thousands of time-steps – making them very
difficult to scale up in practice to
leverage long windows of time.
The LMU is a novel type of RNN that has a memory cell with a special weight
structure, that has been mathematically
derived using dynamical systems theory,
in order to optimally maintain a
continuous-time scale-invariant memory.
It orthogonally represents a sliding
window of history as a linear
combination of Legendre polynomials
– making it very resource efficient with
respect to the length of the window.
The state of the LMU learns to mutually
interact with this memory cell to
compute nonlinear functions across time.
At the same time, these nonlinear units
learn what information should be
orthogonalized by the linear dynamical
memory.
This can be expressed in Keras or
TensorFlow and trained via back-
propagation through time, and this sets a
new state-of-the-art result on permuted
sequential MNIST – a difficult RNN benchmark –
outperforms LSTMs in memory capacity by several orders of magnitude, and has been shown to scale
to input sequences spanning hundreds of
millions of time-steps.
In addition, we have shown that a deep
temporal hierarchy of LMUs outperforms
an equivalent stack of LSTMs in
predicting a nonlinear chaotic time
series. Interestingly, interleaving the
two networks produced the lowest
prediction error on this task.
The LMU can be implemented using
spiking neurons and the resulting
patterns in spiking activity have been
linked to neural time cells observed in
hippocampus, striatum, and medial
prefrontal cortex.
The precision of a Poisson-spiking
implementation scales with the square
root of spike count which has enabled
small-scale versions to be implemented
on neuromorphic chips such as Braindrop
and Loihi – special purpose hardware
that can simulate spiking networks with
extraordinary energy efficiency. Thus the
LMU demonstrates an algorithmic advance
for processing continuous data online
using low-power neuromorphic devices.
