In the last video, we looked at some of
the inputs that we can provide to a
neural network. For example, we can use encodings.
This is an example of one hot encoding,
where you would represent each word
with a field in a vector. If you have
three words, then you have a vector with
three fields, you could use embeddings where you have,
for example a 200 length vector and the fields are the
properties of the words that surround the original word, so
if it's a word like king, the weights for
possible neighbors of that word,
you could have then encodings, embeddings,
you could have audio data, for example or data extracted
from spectrogram, from a recording of
human voice,
you could have video data, for sign
language, for example where you can
detect the depth and motion of hands but also
of parts of people's faces.
And now let's take a moment to consider
what kind of outputs we can get
from a neural network.
Most algorithms are going to give you a
vector
as an output, or a sequence of vectors as
an output.
For example, they could be like the
logit passed through a softmax, or if
you have a vocabulary of
10 000 words, each of them will have a
probability, and then there will be one word with a
very high probability, the word buy for
example,
and then you know that the word - that the
network is giving you the word
buy. This could also happen with visual
networks, where you can have,
for example, 10 numbers, handwritten
numbers,
and there's an output vector of 10, and
if the highest value
is in the zeroth position, it's the
number zero. If the highest value is in
the fifth position, then it's the number
five,
and so forth. So neural networks are
always going to give you
some form of vector representing what
could be
a word, for example, or a sequence of
vectors
for words in a sentence.
But doing that gives you enormous
flexibility to get
all sorts of really interesting tasks.
For example, we could do neural text
generation, where we provide some words to the
program, like The spaceship entered orbit around
the planet, and then the neural network
is going to try to guess
the next word and the next word and the
next word,
and this is an example of uh what the
site
talked to the transformer did. Once in
orbit, the ship jumped to hyperspace,
leaving the planet. It then began
traveling in an elliptical pattern
moving up towards the spherica.
That's very good English! That uh this
generates very good English and i urge
you to
go to the website and give it a try. This
model is called the GPT-2.
It's a kind of transformer. We'll look at
them in
the next videos. It has a training set of
40 gigabytes,
of 8 million web pages, and the whole
neural network has about 1.5
billion parameters, so weights and
neurons and things you can set.
So it's not small or easy to run as you
can see.
But it generates very good English. And
again what this does
is that you get a word like
spaceship, and then the network will give
you the output
entered, or word like elliptical,
and the network will give you the output
pattern,
so it generates the next word and in
this,
in doing this, it's a language model.
The network can perform these kinds of
networks, and perform many tasks, for
example, filling out missing words. If you have a
sentence like I want to ___ the car because it is cheap,
and you provide that to a type of
structure called a BERT,
the BERT will correctly predict that
what was masked was
the word buy. And as a matter of fact, you
will find
this code in your canvas, the code to get
a BERT
to guess a missing word in the input.
This model is - has 24 layers.
The layers have 1024
uh it's hidden, I think it's 1024 neurons
in the hidden layers,
16 of something called attention heads,
and it has a it has 340
million parameters. So it's very large as
well. So what this does is it looks at a
sentence, and then finds the gap, and tries to see what
could have gone there in the middle.
Let's look at this example.
A network called a sequence to sequence
encodes the sentence and passes
something to another neural network that
gives you
I am a student. So this type of neural network
gets je and output some vector,
then takes these two things je and
the vector, and sends it to - it to the one vector,
and the decoder starts sending you I am
a
student, one at a time and see -  each
of them is a sequence of works,
and the encoder passes a vector
to the decoder.
In doing this a neural network can do
translation
of one language from the other as you
can see here.
Neural networks can take images or sound
and classify it. So for example they can
see a part of a spectrogram and tell you
that
the sound e was present there.
And if it does this with more and more
sounds, it'll slowly give you
a dictated form of the words in the
sound wave. For example, it could give you -
as you can see the images, you get an
image, you transform it into a matrix and
then you get, for example.
the transcription into from ASL
sign - sign spelling to English.
One very cool thing that neural networks
can do, that deep learning can do,
is that it can take multi-modal inputs
and outputs. So it can convert
an image into a natural language
description of the image,
a herd of zebras, or it could take a
natural language description, and then
try to generate a picture
that matches that description.
Neural networks can do things like
provide um
answer windows. So this is something that
BERTs can do,
for example, there's a training set
called the Stanford squad
for questions and answers. And so what
the data set has
is questions like, what causes
precipitation to fall,
it has contexts where you can get the
answer,
in meteorology, precipitation is a
product of condensation, of I'm sorry,
water vapor, that falls under
gravity. And then the answer gravity.
And what this network learns to do is to
find two
numbers, the start of the string
that can the index of the string for the
start of the answer,
and the index for the end of the string,
of the end of the answer in the string.
So let me say that again.
What it learns are two numbers, the part
of the string where the answer begins,
and the part of the string with the
answer ends, gravity.
For example, BERTS can be adapted to many
things, but this is one way that they can
do it. So as you can see, this is just
a sampling of what they can do. These
structures
can generate, are very flexible. It can
generate all sorts of
outputs. Neural networks, in
general, deep learning models take
vectors or sequences of vectors as
inputs,
and they produce all sorts, they produce
vectors or sequences of
vectors as outputs. But we can use these
as the next
word in a sentence, the foreign language
version
of a word in our language, the
point in a paragraph which - where you
have the answer to the question,
the spell - the the written version of
some sound,
and there's for example the written
description
of a picture. So there's many ways that
these structures can help us because they
have some internal knowledge about
language. They are
language models, but as we'll see in the
final video of the week,
these are very opaque language models,
and it's really difficult to understand
what is it exactly that the computer
knows about language.
In the next few videos, we will look at
deep learning and how we can apply deep
learning in natural language processes.
