if we're looking at a sequence of
letters what's likely to come next this
is a prediction problem we're given a
letter the goal is to predict the next
letter in the sequence but this
prediction is impossible without the
context of the sequence of letters for
example here o is followed by three
different letters but in context the
next letter becomes easier and easier to
predict as a sequence progresses long
short term memory networks or LS TMS are
designed for applications where the
input is an ordered sequence where
information from earlier in the sequence
may be important LS TMS are a type of
recurrent network which are networks
that reuse the output from a previous
step as an input for the next step like
all neural networks the node performs a
calculation using the inputs and returns
an output value in a recurrent Network
this output is then used along with the
next element as the inputs for the next
step
and so on in an LS TM the nodes are
recurrent but they also have an internal
state the node uses an internal state as
a working memory space which means
information can be stored and retrieved
over many time steps the input value
previous output and the internal state
are all use in the nodes calculations
the results of the calculations are used
not only to provide an output value but
also to update the state like any neural
network LS TM nodes have parameters that
determine how the inputs are used in the
calculations but LS TMS also have
parameters known as gates that control
the flow of information within the node
in particular how much the safe state
information is used as an input to the
calculations these gate parameters are
weights and biases which means the
behavior depends on the inputs so for
example an input of Q doesn't need much
passed information the next letter is
almost certainly a u but an input of e
well we might need to recall much more
passed information
similarly there are gates to control how
much of the current information is saved
to the state and how much the output is
determined by the current calculation
versus the saved information so LS TM
nodes are certainly more complicated
than regular recurrent nodes but this
makes them better at learning the
complex interdependencies in sequences
of data and ultimately they're still
just a node with a bunch of parameters
and these parameters are learned during
training just like with any other neural
network
