
Japanese: 
私はについてのビデオを作りたかったので
GPT  -  2
最近ニュースになったから
オープンAIからのこの非常に強力な言語モデルで、ビデオを作成することから始めるのが理にかなっていると思いました。
トランスフォーマーと言語モデル一般
GPT 2はとても大きい
言語モデルはトランスフォーマーとして実装されていますが、YouTubeのコメントの生成に関する以前のビデオがあります。これは同じ種類の作業です。
これは、最も複雑な、または磁気を冷却するための新しいサンプルを生成するための言語処理から言語モデリング作業です。
組織の創設で見つかると期待されるコンピュータのような一貫した大括弧
私は、ビデオは2017年10月に作成されたものであり、この論文は2017年12月に発表されたものであると信じています。
人々がその種の仕事を実行する方法に革命をもたらしました。それはGPTではありません -  2それはそれ以前のものですね。
それは変圧器です、そしてそれは新しい分野です。ええ比較的新しい
建築
ニューラルネットワークの場合、実際にはあらゆる種類のタスクを実行できますが、この種のタスクは特に得意です。

English: 
cSo I wanted to make a video about
GPT - 2
Because it's been in the news recently
this very powerful language model from open AI and I thought it would make sense to start by just doing a video about
transformers and language models in general because
GPT 2 is a very large
Language model implemented as a transformer, but you have a previous video about generating YouTube comments, which is the same kind of task, right?
That's a language modeling task from language processing to generate new samples for cooling of the most complex or magnetic
Consistent brackets like a computer to expect found in creating organizations
I believe that video was made October 2017 and this paper came out December 2017, which has kind of
Revolutionized the way that people carry out that kind of task. That's not the GPT -  2 that's something before that, right?
That's the transformer, which is a new realm. Yeah relatively new
architecture
for neural networks, that can do actually all kinds of tasks, but they're especially good at this kind of

Turkish: 
cSo I wanted to make a video about
GPT - 2
Because it's been in the news recently
this very powerful language model from open AI and I thought it would make sense to start by just doing a video about
transformers and language models in general because
GPT 2 is a very large
Language model implemented as a transformer, but you have a previous video about generating YouTube comments, which is the same kind of task, right?
That's a language modeling task from language processing to generate new samples for cooling of the most complex or magnetic
Consistent brackets like a computer to expect found in creating organizations
I believe that video was made October 2017 and this paper came out December 2017, which has kind of
Revolutionized the way that people carry out that kind of task. That's not the GPT -  2 that's something before that, right?
That's the transformer, which is a new realm. Yeah relatively new
architecture
for neural networks, that can do actually all kinds of tasks, but they're especially good at this kind of

English: 
language modeling task
a language model is a probability distribution over like sequences of
Tokens or symbols or words or whatever in a language?
So for any given like sequence of tokens, it can tell you how likely that is
So if you have a good language model of English
It can look at a sequence of you know words or characters or whatever and say how likely that is to occur in English
How likely that is to be an English phrase or sentence or whatever
And when you have that you can use that for a lot of different tasks. So
If you want to generate text, then you can you can just sort of sample from that distribution and keep giving it
its own output
so you you sample a word and then you say
And to be clear sampling from a distribution means you're just taking

Turkish: 
language modeling task
a language model is a probability distribution over like sequences of
Tokens or symbols or words or whatever in a language?
So for any given like sequence of tokens, it can tell you how likely that is
So if you have a good language model of English
It can look at a sequence of you know words or characters or whatever and say how likely that is to occur in English
How likely that is to be an English phrase or sentence or whatever
And when you have that you can use that for a lot of different tasks. So
If you want to generate text, then you can you can just sort of sample from that distribution and keep giving it
its own output
so you you sample a word and then you say
And to be clear sampling from a distribution means you're just taking

Japanese: 
言語モデリングタスク
言語モデルは、のようなシーケンス上の確率分布です。
トークン、シンボル、単語、あるいは言語の中の何か
そのため、トークンのような一連のトークンについては、それがどれほど可能性が高いかを示すことができます。
あなたが英語の良い言語モデルを持っているのであれば
それはあなたが単語や文字、その他何でも知っていることのシーケンスを調べて、それが英語で起こる可能性がどのくらいあるかを言うことができます
それがどのくらいありそうなのは、英語のフレーズや文、あるいは何でもです。
そして、あなたがそれを持っているとき、あなたは多くの異なったタスクのためにそれを使うことができます。そう
もしあなたがテキストを生成したいのなら、あなたはそのディストリビューションからサンプルをソートしてそれを与え続けることができます。
それ自身の出力
だからあなたは単語をサンプリングしてからあなたが言う
そして、配布から明確にサンプリングすることは、あなたが取っていることを意味します

English: 
Your you're sort of rolling the dice on that probability distribution and taking whichever one comes out. So
so you can like sample a word and then
And then say okay conditioning on that given that the first word of this sentence is V
What does the probability distribution look like for the second word?
And then you sample from that distribution and then it's you know
with a cat and you say given that it's the cat what's likely to come next and so on so you can you can build
up a
string of text by sampling from
Your distribution that's one of the things you could use it for
most of us kind of have an example of this sort of in our pockets of
Its actual absolutely right and that's like that's the that's the way that most people interact with a language model
I guess this is how I often start a sentence
apparently with I I am not sure if you have any questions or concerns, please visit the

Turkish: 
Your you're sort of rolling the dice on that probability distribution and taking whichever one comes out. So
so you can like sample a word and then
And then say okay conditioning on that given that the first word of this sentence is V
What does the probability distribution look like for the second word?
And then you sample from that distribution and then it's you know
with a cat and you say given that it's the cat what's likely to come next and so on so you can you can build
up a
string of text by sampling from
Your distribution that's one of the things you could use it for
most of us kind of have an example of this sort of in our pockets of
Its actual absolutely right and that's like that's the that's the way that most people interact with a language model
I guess this is how I often start a sentence
apparently with I I am not sure if you have any questions or concerns, please visit the

Japanese: 
あなたは、その確率分布にサイコロを転がすことと、どちらが出てくるかを選ぶことのようなものです。そう
だからあなたは言葉を試してみて、その後好きになることができます
この文の最初の単語がV
2番目の単語の確率分布はどのようなものですか？
そして、あなたはそのディストリビューションからサンプリングし、そしてそれはあなたが知っていることです
猫と一緒に、あなたはそれが次に来る可能性が高いのは猫であるとすればあなたが言うことができますので、あなたはあなたが構築することができます
をアップ
サンプリングによるテキストの文字列
あなたのディストリビューションはあなたがそれを使うことができるものの一つです
私たちのほとんどは、この種の例を私たちのポケットに持っています。
その実際のところ絶対的に正しいです、そしてそれはそれがほとんどの人が言語モデルと対話する方法であるようなものです
私はこれが私がよく文を始める方法であると思います
どうやらIIであなたが質問や懸念があるかどうかわからない場合は、をご覧ください。

Turkish: 
Plugin settings so I can do it for the first time in the future of that's no good
Here's a different option. Let's just see what this way. Maybe the same
I am in the morning
But I can't find it on the phone screen from the phone screen on the phone screen on the phone screen on the phone screen
On the phone screen. I don't actually know how this is implemented
it might be a neural network, but my guess is that it's some kind of
like Markov model Markov chain type setup where you just
for each word in your language you look at your data set and you see how often a particular
how often each other word is
Following that word and then that's how you build your distribution
So like for the word "I" the most common word to follow that is "am" and there are a few others, you know
so this is like a very simple model and
This sentence on the phone screen on the phone screen on the phone screen on the phone screen on the phone screen
He's actually very unlikely, right?

English: 
Plugin settings so I can do it for the first time in the future of that's no good
Here's a different option. Let's just see what this way. Maybe the same
I am in the morning
But I can't find it on the phone screen from the phone screen on the phone screen on the phone screen on the phone screen
On the phone screen. I don't actually know how this is implemented
it might be a neural network, but my guess is that it's some kind of
like Markov model Markov chain type setup where you just
for each word in your language you look at your data set and you see how often a particular
how often each other word is
Following that word and then that's how you build your distribution
So like for the word "I" the most common word to follow that is "am" and there are a few others, you know
so this is like a very simple model and
This sentence on the phone screen on the phone screen on the phone screen on the phone screen on the phone screen
He's actually very unlikely, right?

Japanese: 
プラグインの設定ですので、将来的には初めてでも構いません。
これは別の選択肢です。これを見てみましょう。たぶん同じ
私は朝です
しかし私は電話スクリーンの電話スクリーンの電話スクリーンの電話スクリーンから電話スクリーンでそれを見つけることができません
電話画面でこれがどのように実装されているのか、実際にはわかりません
それはニューラルネットワークかもしれませんが、それはある種のものだと私は思います
マルコフモデルマルコフチェーンタイプセットアップのようにあなただけのところ
あなたの言語の各単語について、あなたはあなたのデータセットを見て、あなたはどれくらい頻繁に特定の
他の単語同士の関係
その言葉に続いて、それがあなたがあなたのディストリビューションを構築する方法です
したがって、「私」という言葉が「am」であり、他にもいくつかあるという最も一般的な言葉が好きです。
これはとても単純なモデルのようなものです。
電話スクリーンの電話スクリーンの電話スクリーンの電話スクリーンの電話スクリーンのこの文
彼は実際には非常にありそうもないですね。

Turkish: 
This is the super low probability sentence where I would somebody type this and the thing is it's like myopic
It's only I'm not sure I even it's probably only looking at the previous word
It might be looking at like the previous two words, but the problem is to look back. It becomes extremely expensive
Computationally expensive right?
Like you've got I don't know 50,000 words that you might be looking at and so then it so you're you're you're remembering
50,000 probability distributions or
50,000 top three words
but you know then if you want to do
2, that's 50,000 squared right and if you want to go back three words
You have to cube it. So you like raising it to the power of the number of words back you want to go which is
Which means that this type of model?
Basically doesn't look back by the time we're saying on the it's already forgotten the previous time
It said on the it doesn't realize that it's repeating itself and there are slightly better things you can do in this general area
But like fundamentally if you don't remember you're not going to be able to make good sentences
If you can't remember the beginning of the sentence by the time you're at the end of it, right?
and

English: 
This is the super low probability sentence where I would somebody type this and the thing is it's like myopic
It's only I'm not sure I even it's probably only looking at the previous word
It might be looking at like the previous two words, but the problem is to look back. It becomes extremely expensive
Computationally expensive right?
Like you've got I don't know 50,000 words that you might be looking at and so then it so you're you're you're remembering
50,000 probability distributions or
50,000 top three words
but you know then if you want to do
2, that's 50,000 squared right and if you want to go back three words
You have to cube it. So you like raising it to the power of the number of words back you want to go which is
Which means that this type of model?
Basically doesn't look back by the time we're saying on the it's already forgotten the previous time
It said on the it doesn't realize that it's repeating itself and there are slightly better things you can do in this general area
But like fundamentally if you don't remember you're not going to be able to make good sentences
If you can't remember the beginning of the sentence by the time you're at the end of it, right?
and

Japanese: 
これは私が誰かがこれをタイプするであろう超低確率の文であり、事はそれが近視のようなものです
それは、私が前の言葉を見ているだけかもしれないと確信できないだけです。
前の2つの単語のように見ているかもしれませんが、問題は振り返ることです。非常に高価になる
計算コストが高い？
あなたが持っているように私はあなたが見ているかもしれないと私はあなたが覚えているあなたがいるかもしれないので、それはあなたがあなたが覚えているかもしれないことを50,000語知らない
50,000確率分布または
5万トップ3ワード
あなたがしたい場合は、あなたはそれから知っている
2、これは50,000の2乗です。もしあなたが3つの言葉に戻りたいのであれば
あなたはそれを立てる必要があります。それで、あなたはそれをあなたが行きたい言葉の数の力に戻すのが好きです。
これは、このタイプのモデルということですか？
基本的に、前回忘れていたことを言っている時間までに振り返らないでください。
それはそれがそれ自身を繰り返すことを理解していないし、この一般的な分野であなたができることがわずかに良いことがあることをそれが言った
あなたが覚えていないのであれば基本的にのようにあなたは良い文を作ることができるようになるだろうことはないだろう
文の終わりに達するまでに文の始まりを思い出せない場合は、そうでしょうか。
そして

Japanese: 
そう
言語モデルにおける大きな進歩分野の1つは、長期的な依存関係の処理です。
私はあらゆる種類の依存関係を扱うことを意味しますが、特に長期の依存関係
Shawnがビデオを録画するためにハックスペースにやってきたような文章を持っています。
あなたのモデルが良ければ、その状況では真っ白
あなたはおそらく代名詞のように期待しているので、それは彼女のような理由です。
あなたはそれらを何でも知っていますが、関連する情報は短い言葉です。
これは文の冒頭にあるすべての方法に似ています
だからあなたのモデルはああ言うことができる必要があります、大丈夫、あなたはショーンがそれを知っている
通常は男性の代名詞に関連付けられているので、ここに男性の代名詞を入れます。モデルに振り返る能力がない場合
あるいは、そのときに言ったことを覚えておくだけでいいのです。
あなたはこれらの文で終わること？
どこにも行かないように
それは推測をするかもしれないようにそれはほんのわずかです
代名詞をランダムに推測して間違った結果を出す可能性があります。

Turkish: 
so
One of the big areas of progress in language models is handling long term dependencies
I mean handling dependencies of any kind but especially long term dependencies
You've got a sentence that's like Shawn came to the hack space to record a video and I talked to
Blank right in that situation if your model is good
you're expecting like a pronoun probably so it's it's she they
You know them whatever and but the relevant piece of information is the words short
Which is like all the way at the beginning of the sentence
so your model needs to be able to say oh, okay, you know Shawn that's
Usually associated with male pronouns, so we'll put the male pronoun in there. And if your model doesn't have that ability to look back
Or to just remember what it's just said then
You end up with these sentences that?
Like go nowhere
It's just a slight like it might make a guess
just a random guess at a pronoun and might get it wrong or it might just

English: 
so
One of the big areas of progress in language models is handling long term dependencies
I mean handling dependencies of any kind but especially long term dependencies
You've got a sentence that's like Shawn came to the hack space to record a video and I talked to
Blank right in that situation if your model is good
you're expecting like a pronoun probably so it's it's she they
You know them whatever and but the relevant piece of information is the words short
Which is like all the way at the beginning of the sentence
so your model needs to be able to say oh, okay, you know Shawn that's
Usually associated with male pronouns, so we'll put the male pronoun in there. And if your model doesn't have that ability to look back
Or to just remember what it's just said then
You end up with these sentences that?
Like go nowhere
It's just a slight like it might make a guess
just a random guess at a pronoun and might get it wrong or it might just

English: 
and I talked to and then just be like
Frank, you know just like introduced a new name because it's guessing at what's likely to come there and it's completely forgotten that sure was
Ever like a thing. So yeah, these kind of dependencies are a big issue with things that you would want to language model to do
But we've only so far talked about
Language models for generating text in this way, but you can also use them for all kinds of different things. So like
people use language models for translation
Obviously you have some input sequence that's like in English and you want to output a sequence in French or something like that
Having a good language model is really important so that you end up with something. That makes sense
Summarization is a task that people often want
Where you read in a long piece of text and then you generate a short piece of text. That's like a summary of that
that's the kind of thing that you would use a language model for or
reading a piece of text and then answering questions about that text or

Japanese: 
と話した後、
Frank、新しい名前が導入されたのと同じように、そこに来る可能性があると推測していることを確信しています。
物事が好きです。だから、ええ、これらの種類の依存関係はあなたが言語モデルにやらせたいことで大きな問題です
しかし、これまでに話し合ったことがあります
このようにテキストを生成するための言語モデルですが、あらゆる種類のものにそれらを使用することもできます。以下のようなので
人々は翻訳に言語モデルを使う
明らかに、英語のような入力シーケンスがあり、フランス語またはそのようなものでシーケンスを出力したいのです。
良い言語モデルを持つことは本当に重要であるため、あなたは何かに終わるのです。それは理にかなっている
要約は、人々がしばしば望むタスクです。
長いテキストを読み、次に短いテキストを生成したところ。それはその要約です。
それはあなたが言語モデルを使うためのものです。
テキストの一部を読み、そのテキストに関する質問に答える

Turkish: 
and I talked to and then just be like
Frank, you know just like introduced a new name because it's guessing at what's likely to come there and it's completely forgotten that sure was
Ever like a thing. So yeah, these kind of dependencies are a big issue with things that you would want to language model to do
But we've only so far talked about
Language models for generating text in this way, but you can also use them for all kinds of different things. So like
people use language models for translation
Obviously you have some input sequence that's like in English and you want to output a sequence in French or something like that
Having a good language model is really important so that you end up with something. That makes sense
Summarization is a task that people often want
Where you read in a long piece of text and then you generate a short piece of text. That's like a summary of that
that's the kind of thing that you would use a language model for or
reading a piece of text and then answering questions about that text or

English: 
If you want to write like a chatbot that's going to converse with people having a language model as good like basically almost all
like natural language processing
right is it's useful to have this the other thing is
You can use it to enhance
Enhance a lot of other language related tasks
So if you're doing like speech recognition then having a good language model
Like there's a lot of things people can say that sound very similar and to get the right one
You need to be like, oh, well, this actually makes sense, you know
This word. That sounds very similar
Would be incoherent in this sentence. It's a very low probability
It's much more likely that they this thing which is like would flow in the language
And human beings do this all the time same thing
With recognizing text from images, you know
You've got two words that look similar or there's some ambiguity or whatever and to resolve that you need
an
understanding of what word would make sense there what word would fit if you're trying to use a neural network to do the kind of

Japanese: 
あなたがチャットボットのように書きたいならば、それは基本的にほとんどすべてのように良い言語モデルを持っている人々と会話するつもりです
自然言語処理のように
これは他のものがそうであることは便利です
あなたはそれを強化するために使うことができます
他の多くの言語関連タスクを強化する
あなたが音声認識のようにしているのであればそれなら良い言語モデルを持つこと
人々はそれが非常によく似ていると言うことができることがたくさんあるようにそして正しいものを手に入れるために
あなたはのようになる必要があります、ああ、まあ、これは実際には理にかなっている、あなたが知っている
この単語。それは非常に似ているね
この文には一貫性がないでしょう。非常に低い確率です
それは彼らがこのようなことが言語で流れるだろうということがはるかにありそうです
そして人間はこれをいつも同じことをしています
画像からテキストを認識することで、あなたは知っています
2つの単語が似ているように見えるか、あいまいさがあるかどうかなど、必要なものを解決するために
あ
あなたがニューラルネットワークを使って次のようなことをしようとしているならば、どの単語がそこに当てはまるのかについて理解する

Turkish: 
If you want to write like a chatbot that's going to converse with people having a language model as good like basically almost all
like natural language processing
right is it's useful to have this the other thing is
You can use it to enhance
Enhance a lot of other language related tasks
So if you're doing like speech recognition then having a good language model
Like there's a lot of things people can say that sound very similar and to get the right one
You need to be like, oh, well, this actually makes sense, you know
This word. That sounds very similar
Would be incoherent in this sentence. It's a very low probability
It's much more likely that they this thing which is like would flow in the language
And human beings do this all the time same thing
With recognizing text from images, you know
You've got two words that look similar or there's some ambiguity or whatever and to resolve that you need
an
understanding of what word would make sense there what word would fit if you're trying to use a neural network to do the kind of

English: 
thing we were talking about before, of having a phone, you know autocorrect based on the previous word or two
Suppose you've got a sequence of two words going in you've got "so" and then "I" and you put
both of these into your network and it will then output, you know
like "said" for example as like a sensible next word and then what you do is you throw away or so and you then
Bring your set around and you make a new
Sequence which is I said and then put that into your network and it will put out
like I said - for example would make sense and so on and you keep going around, but the problem is
This length is really short you try and make this long enough to contain an entire
Sentence just an ordinary length sentence and this problem starts to become really really hard
And networks have a hard time learning it and you don't get very good performance
and even then
You're still like have this absolute hard limit on how long a thing you you have to just pick a number

Japanese: 
私たちが以前に話していたこと、電話を持っていることについて、あなたは前の1つまたは2つの単語に基づいて自動修正を知っています
2つの単語が連続して入ってきて、 "so"、そして "I"と入力したとします。
これらの両方があなたのネットワークに入って、それが出力されるでしょう、あなたは知っています
例えば「言った」のように、賢明な次の単語のように、そしてあなたがしていることはあなたが捨てるか、そうあなたは
あなたのセットを持ってきて、あなたは新しいものを作ります
私が言ったシーケンスをあなたのネットワークに入れてから出すと
私が言ったように - たとえば、理にかなっているとあなたは歩き続けますが、問題は
この長さは本当に短いですあなたが試してみて、これを全体を含むのに十分な長さにする
文は普通の長さの文であり、この問題は本当に本当に困難になり始めます
そして、ネットワークはそれを学ぶのに苦労しています、そして、あなたは非常に良いパフォーマンスを得ません
そしてそれでも
あなたはまだあなたがあなたがちょうどあなたが数を選ぶために持っているものの長さにこの絶対的な厳しい制限を持っているのが好きです

Turkish: 
thing we were talking about before, of having a phone, you know autocorrect based on the previous word or two
Suppose you've got a sequence of two words going in you've got "so" and then "I" and you put
both of these into your network and it will then output, you know
like "said" for example as like a sensible next word and then what you do is you throw away or so and you then
Bring your set around and you make a new
Sequence which is I said and then put that into your network and it will put out
like I said - for example would make sense and so on and you keep going around, but the problem is
This length is really short you try and make this long enough to contain an entire
Sentence just an ordinary length sentence and this problem starts to become really really hard
And networks have a hard time learning it and you don't get very good performance
and even then
You're still like have this absolute hard limit on how long a thing you you have to just pick a number

English: 
That's like how far back am I looking a better thing to do you say recurring neural network? Where you
You give the thing. Let's like divide that up
So in this case, then you have a network you give it this vector?
You just like have a bunch of numbers which is gonna be like the memory
for that network is the idea like the problem is it's forgotten in the beginning of the sentence by the time it gets to the
end so we've got to give it some way of remembering and
rather than feeding it the entire sentence every time you give it this vector and
you give it to just one word at a time of your inputs and
This vector, which you initialize I guess with zeros. I want to be clear
This is not something that I've studied in a huge amount of detail
I'm just like giving the overall like structure of the thing. But the point is you give it this vector and the word and
it outputs its guess for the next word and also a
Modified version of that vector that you then for the next thing you give it
where did it spit out or the sequence that it spit out and
Its own modified version of the vector every cycle that goes around. It's modifying this memory

Japanese: 
それは、私がどれほど前に戻ってあなたが再帰ニューラルネットワークを言うためにもっと良いことを探しているかのようですか？どこで
あなたはものをあげる。それを分けてみましょう
それで、この場合、あなたはそれにこのベクトルを与えるネットワークがありますか？
あなたはちょうど記憶のようになるつもりである数字の束を持っているのが好きです
そのネットワークの問題は、問題が発生するまでに文の最初で忘れられているという問題です。
最後に覚えておく方法を教えて
このベクトルを与えるたびに文全体を送るのではなく
あなたはあなたのインプットの時にそれをただ一つの単語に与える
このベクトルは、あなたが初期化したもので、私はゼロで推測します。はっきりさせたい
これは私が膨大な詳細で勉強したものではありません
私は物事の全体的に似た構造を与えるようなものです。しかし重要なのは、このベクトルと単語を
それは次の単語に対する推測を出力します。
そのベクトルの修正版で、次のものにします。
吐き出した場所、吐き出した順番
循環するサイクルごとにベクトルの独自の修正版。それはこの記憶を修正しています

Turkish: 
That's like how far back am I looking a better thing to do you say recurring neural network? Where you
You give the thing. Let's like divide that up
So in this case, then you have a network you give it this vector?
You just like have a bunch of numbers which is gonna be like the memory
for that network is the idea like the problem is it's forgotten in the beginning of the sentence by the time it gets to the
end so we've got to give it some way of remembering and
rather than feeding it the entire sentence every time you give it this vector and
you give it to just one word at a time of your inputs and
This vector, which you initialize I guess with zeros. I want to be clear
This is not something that I've studied in a huge amount of detail
I'm just like giving the overall like structure of the thing. But the point is you give it this vector and the word and
it outputs its guess for the next word and also a
Modified version of that vector that you then for the next thing you give it
where did it spit out or the sequence that it spit out and
Its own modified version of the vector every cycle that goes around. It's modifying this memory

Japanese: 
このシステムが非常によく訓練されたようになれば
もしあなたがそれに最初の単語Shawnを与えればそれを与えるならばこのベクトルの一部はいくつかを含むことになるでしょう
このセンテンスのこの主題のような情報は短い単語です
他の部分はおそらく同じように追跡します
この文とその類のものには男性の代名詞を使うことを期待します。
それであなたはこれを持ってそれをそれに渡します、そしてこれらは同じネットワークのちょうど2つの例です、そしてそれはそれを続けています
毎回
だから私はこのようにそれが吐き出すので、AIもここに来てあなたはその後外に置くかもしれないなど
しかし、それはこの継続的なスレッドを持っています
それが最初に重要な何かを見つけ出すならばそれが原則として物事を通過し続けるので、記憶の効果的に通過する
ええと
シェイクスピアの全作品は、それが生み出すものです。何もない
厳密に言えば、それが通過することから固執するのを止めること
繰り返しから繰り返し、毎回繰り返し

Turkish: 
Once this system is like trained very well
If you give it if you give it the first word Shawn then part of this vector is going to contain some
information that's like this subject of this sentence is the word short and
some other part will probably keep track of like
We expect to use a male pronoun for this sentence and that kind of thing
So you take this and give it to that and these are just two instances of the same network, and then it keeps going
every time
So it spits out like this is I so then the AI also comes around to here you might then put outside and so on
But it's got this continuous thread of
of memory effectively going through because it keeps passing the thing through in principle if it figures out something important at the beginning of
You know
The complete works of Shakespeare that it's generating. There's nothing
Strictly speaking stopping that from persisting from being passed through
From from iteration to iteration to iteration every time

English: 
Once this system is like trained very well
If you give it if you give it the first word Shawn then part of this vector is going to contain some
information that's like this subject of this sentence is the word short and
some other part will probably keep track of like
We expect to use a male pronoun for this sentence and that kind of thing
So you take this and give it to that and these are just two instances of the same network, and then it keeps going
every time
So it spits out like this is I so then the AI also comes around to here you might then put outside and so on
But it's got this continuous thread of
of memory effectively going through because it keeps passing the thing through in principle if it figures out something important at the beginning of
You know
The complete works of Shakespeare that it's generating. There's nothing
Strictly speaking stopping that from persisting from being passed through
From from iteration to iteration to iteration every time

English: 
In practice, it doesn't work that way because in practice
The whole thing is being messed with by the network on every step and so in in the training process it's going to learn
That it performs best when it leaves most of it alone and it doesn't just randomly change the whole thing
But by the time you're on the fiftieth word of your sentence
whatever the network decided to do on the first word of the sentence is a
photocopy of a photocopy of a photocopy of a photocopy and so
things have a tendency to
Fade out to nothing. It has to be successfully remembered at every step of this process
and if at any point it gets overwritten with something else or just
It did its best to remember it but it's actually remembering 99% of it each time point nine
Nine to the fifty is like actually not that big of a number
So these things work pretty well, but they still get the performance like really quickly drops off once the sentences start to get long
So this is a recurrent neural network

Turkish: 
In practice, it doesn't work that way because in practice
The whole thing is being messed with by the network on every step and so in in the training process it's going to learn
That it performs best when it leaves most of it alone and it doesn't just randomly change the whole thing
But by the time you're on the fiftieth word of your sentence
whatever the network decided to do on the first word of the sentence is a
photocopy of a photocopy of a photocopy of a photocopy and so
things have a tendency to
Fade out to nothing. It has to be successfully remembered at every step of this process
and if at any point it gets overwritten with something else or just
It did its best to remember it but it's actually remembering 99% of it each time point nine
Nine to the fifty is like actually not that big of a number
So these things work pretty well, but they still get the performance like really quickly drops off once the sentences start to get long
So this is a recurrent neural network

Japanese: 
実際には、そのようには機能しません。
すべてのことがネットワークによってすべてのステップで惑わされているので、トレーニングプロセスの中でそれは学ぶつもりです
それがほとんどそのままにしておき、それが全体をランダムに変更するだけではないときに最高のパフォーマンスを発揮すること
しかし、あなたはあなたの文章の50番目の単語に着いている時までに
ネットワークが文の最初の単語に対して行うことを決定したものは何でも
コピーのコピーコピーのコピー
物事は
何もしないでください。それはこのプロセスのあらゆるステップで首尾よく覚えられなければなりません
そしていずれかの時点でそれが何か他のもので上書きされたり、単に
覚えておくのが最善を尽くしましたが、実際には各時点9の99％を覚えています。
50までの9は実際にはそれほど大きくない数のようなものです
したがって、これらはうまく機能しますが、文が長くなり始めると、パフォーマンスがすぐに低下します。
だからこれはリカレントニューラルネットワークです

Japanese: 
rnlこれらすべてのボックスのため
これは異なる時間ステップで同じネットワークであるため、本当に同じボックスです。これは本当にこのようなループです
毎回ネットワークの出力を入力として返すので、これはうまく機能し、それから人々はあらゆる種類の面白いことを試してみました
LS TMSのようなもの。このネットワークにはリカレントネットワークのようなあらゆる種類の変種があります
LS TMがそのことです。それは使うかもしれませんね？ちょっとシュールなのです
しかし、ええ、それで、その考えは、これらのネットワークの内部ではずっと複雑です。
実際にゲーティングについて特定の決定を下す種類のサブネットワークがあります。そう
このシステムに、ほとんどのものを渡すべきであることを知らせる必要があるのではなく、渡されるアーキテクチャーでもっと多くのものを渡します。
ほとんどのものは上にあり、それから学習の一部のようなものがあるサブがあります
何を忘れるかを決める
各ステップで、何を変更し、何をどの小包に入れるかなどを決定するのが好きで、パフォーマンスが向上します。
彼らはより長い間関連情報を情報に頼ることができます

Turkish: 
rnl because all of these boxes
Are really the same box because this is the same network at different time steps. It's really a loop like this
You're giving the output of the network back as input every time so this works better and then people have tried all kinds of interesting
Things things like LS TMS. There's all kinds of variants on this general like recurrent Network
LS TM is the thing. That might use isn't it? Right right long short-term memory, which is kind of surreal
But yeah, so the idea of that is it's a lot more complicated inside these networks
There's actually kind of sub networks that make specific decisions about gating things. So
Rather than having to have this system learn that it ought to pass most things on it's sort of more in the architecture that passes
most things on and then there's a there's a sub there's like part of the learning is
Deciding what to forget
At each step and like deciding what to change and what to put it in what parcel and so on and they perform better
They can hang on to the information the relevant information for longer

English: 
rnl because all of these boxes
Are really the same box because this is the same network at different time steps. It's really a loop like this
You're giving the output of the network back as input every time so this works better and then people have tried all kinds of interesting
Things things like LS TMS. There's all kinds of variants on this general like recurrent Network
LS TM is the thing. That might use isn't it? Right right long short-term memory, which is kind of surreal
But yeah, so the idea of that is it's a lot more complicated inside these networks
There's actually kind of sub networks that make specific decisions about gating things. So
Rather than having to have this system learn that it ought to pass most things on it's sort of more in the architecture that passes
most things on and then there's a there's a sub there's like part of the learning is
Deciding what to forget
At each step and like deciding what to change and what to put it in what parcel and so on and they perform better
They can hang on to the information the relevant information for longer

Japanese: 
しかし、人々がこの種のシステムに組み入れることが多いもう1つのことは、アテンションと呼ばれるものです。
これは実際にはかなり良い比喩です
同じようにどこにありますか？
あなたの隠れた状態のどの部分にハングアップするか、どの部分を忘れるかを決めるネットワーク
ゲーティングやもののようなこれらの種類の決定
あなたは入力のどの部分をどの部分で使うべきかに注意を払うべきかを決定するシステムを持っています
計算のどの部分を無視すればよいか、これは実際には非常に強力であることがわかります。だからこの論文がありました
これはいつでしたか。
2000年
そう、これは同じ年になったので、これは面白いです。
YouTubeのコメントを生成するためのビデオです。これは12月です。ビデオは今10月の古代史だったと思います
さて、私たちは2年前に話しています。これの考えはそれが呼ばれる注意があなたが必要とするすべてであるのでです。彼らはこのシステムを開発しました。どこで

Turkish: 
But the other thing that people often build into these kinds of systems is something called attention
Which is actually a pretty good metaphor
Where in the same way that you would have?
networks that decide which parts of your hidden state to hang on to or which starts to forget or
Those kinds of decisions like gating and stuff
You have a system which is deciding which parts of the input to pay attention to which parts to use in
The in the calculation and which parts to ignore and this turns out to be actually very powerful. So there was this paper
When was this?
2000
2017. Yeah, so this is funny because this came out the same year as
The video you have about generating YouTube comments. This is in December. I think that video was October ancient history now
Alright, we're talking two years ago. The idea of this is as its called attention is all you need. They developed this system. Whereby

English: 
But the other thing that people often build into these kinds of systems is something called attention
Which is actually a pretty good metaphor
Where in the same way that you would have?
networks that decide which parts of your hidden state to hang on to or which starts to forget or
Those kinds of decisions like gating and stuff
You have a system which is deciding which parts of the input to pay attention to which parts to use in
The in the calculation and which parts to ignore and this turns out to be actually very powerful. So there was this paper
When was this?
2000
2017. Yeah, so this is funny because this came out the same year as
The video you have about generating YouTube comments. This is in December. I think that video was October ancient history now
Alright, we're talking two years ago. The idea of this is as its called attention is all you need. They developed this system. Whereby

Turkish: 
it's actually as
it's a lot simpler as a
As a network you can see on the diagram here if you compare this to the diagram for an LS TM or
Any of those kind of variants? It's relatively simple and it's just kind of using attention to do everything
So when made that video the ASTM type stuff was like state-of-the-art and that was until a couple of months later
I guess when this paper came out the idea of this is that attention is all you need of it like this stuff about
having gates for forgetting things and
All of that all of that kind of stuff in fact your whole recurrence like architecture
you can do away with it and just use attention attention is powerful enough to
do everything that you need at its base attention is about actively deciding in the same way that

English: 
it's actually as
it's a lot simpler as a
As a network you can see on the diagram here if you compare this to the diagram for an LS TM or
Any of those kind of variants? It's relatively simple and it's just kind of using attention to do everything
So when made that video the ASTM type stuff was like state-of-the-art and that was until a couple of months later
I guess when this paper came out the idea of this is that attention is all you need of it like this stuff about
having gates for forgetting things and
All of that all of that kind of stuff in fact your whole recurrence like architecture
you can do away with it and just use attention attention is powerful enough to
do everything that you need at its base attention is about actively deciding in the same way that

Japanese: 
実際には
それはずっと簡単です
これをLS TMのダイアグラムと比較すると、ネットワークとしてここにダイアグラムを見ることができます。
そのような変種はありますか？それは比較的単純で、すべてを行うために注意を使用しているようなものです。
だからそのビデオを作ったときにASTMタイプのものは最先端のようであり、それは数ヶ月後まででした
この論文が出てきたとき、私が思うに、このアイディアは、あなたが必要としているのは、このようなものであることだけです。
物事を忘れるための門を持つ
そのようなものすべてのことすべて実際にはアーキテクチャのようにあなたの全体の再発
あなたはそれを廃止し、単に注意を使うことができます
あなたがその基本的な注意で必要とするすべてをすることは同じ方法で積極的に決めることについてです

Turkish: 
the LS TM is actively deciding what to forget and so on this is deciding which parts of
some other part of the data it's going to
take into account which parts it's going to look at like it can be very dangerous in AI to
use words for things that are words that people already use
For the way that humans do things. It makes it very easy transform for more finds and just
make, you know get confused because the abstraction doesn't quite work but I think attention is a pretty decent thing because it is
It does make sense
It sort of draws the relationships between things so you can have attention from the output to the input
Which is what that would be you can also have attention from the output to other parts of the output
so for example when I'm generating in that sentence like
Shawn came to record a video or whatever by the time I get to generating the word him
I don't need to be thinking about the entire sentence
I can just focus my attention on where I remember
The name was so the attention goes to Shawn and then I can make the decision for to use the word him based on

Japanese: 
LS TMは何を忘れるべきかを積極的に決定しています。
データの他の部分
それがAIに非常に危険であることができるようにそれが見ることになっているどの部分を考慮に入れるか
人々がすでに使っている言葉であるものに言葉を使う
人間がものをする方法のために。それはより多くの発見のためにそれを非常に簡単に変換します
抽象化はうまく機能しないので、混乱してしまうかもしれませんが、注意が必要なのでかなり注意が必要です。
意味がありますか
それは一種のものの間の関係を描くので、あなたは出力から入力まで注意を払うことができます
これは、あなたが出力から出力の他の部分にも注目することができるということです
たとえば、その文で次のように生成するとします。
ショーンはビデオを録画するようになった。
文全体について考える必要はありません
覚えているところにだけ注​​意を集中することができます
名前はそうショーンに注意が行きますので、私は彼に基づいて彼の言葉を使用するための決定を下すことができるようになりました

English: 
the LS TM is actively deciding what to forget and so on this is deciding which parts of
some other part of the data it's going to
take into account which parts it's going to look at like it can be very dangerous in AI to
use words for things that are words that people already use
For the way that humans do things. It makes it very easy transform for more finds and just
make, you know get confused because the abstraction doesn't quite work but I think attention is a pretty decent thing because it is
It does make sense
It sort of draws the relationships between things so you can have attention from the output to the input
Which is what that would be you can also have attention from the output to other parts of the output
so for example when I'm generating in that sentence like
Shawn came to record a video or whatever by the time I get to generating the word him
I don't need to be thinking about the entire sentence
I can just focus my attention on where I remember
The name was so the attention goes to Shawn and then I can make the decision for to use the word him based on

Turkish: 
that
so
so rather than having to hang on to a huge amount of memory you
Can just selectively look at the things that are actually relevant and the system learns
Where to look where to pay attention to and that's really cool like you can do it
There's attention based systems for all kinds of things like not just text you can do
Like suppose you have your input is like an image and you want to caption it
You can actually look at when it was outputting the sequence you can say when you generated the word dog
What was your you can get like an attention heat map and it will highlight the dog
Because that's the part of the image that it was paying attention to when it generated that output
It makes your system more interpretable because you can see what it was thinking and sometimes you can catch problems that way as well
which is kind of fun like
It generates the output that's like a man is lifting a dumbbell or something like that and you look at it

English: 
that
so
so rather than having to hang on to a huge amount of memory you
Can just selectively look at the things that are actually relevant and the system learns
Where to look where to pay attention to and that's really cool like you can do it
There's attention based systems for all kinds of things like not just text you can do
Like suppose you have your input is like an image and you want to caption it
You can actually look at when it was outputting the sequence you can say when you generated the word dog
What was your you can get like an attention heat map and it will highlight the dog
Because that's the part of the image that it was paying attention to when it generated that output
It makes your system more interpretable because you can see what it was thinking and sometimes you can catch problems that way as well
which is kind of fun like
It generates the output that's like a man is lifting a dumbbell or something like that and you look at it

Japanese: 
それ
そう
大量のメモリを使用するのではなく、
実際に関連性があり、システムが学習したものを選択的に見ることができます。
どこに注意を払うべきか、どこで見ればいいですか。
あなたができるテキストだけでなく、あらゆる種類のもののための注意ベースのシステムがあります。
入力が画像のようで、キャプションを付けたいとします。
あなたはそれがあなたが単語dogを生成したときに言うことができるシーケンスを出力していたときに実際に見ることができます
あなたが注意喚起ヒートマップのようになることができるあなたの何でした、そしてそれは犬をハイライトします
それはその出力を生成したときにそれが注意を払っていたというイメージの一部だから
あなたがそれが考えていたものを見ることができ、時にはあなたがそのように問題を見つけることができるので、それはあなたのシステムをより解釈しやすくします
どのような楽しみです
男がダンベルなどを持ち上げているような出力を生成します。

Japanese: 
そしてそれは実際には正しくありません。それはその所有者のトロットのようなものです、そして私は彼がマグカップからお茶を飲んでいるのですよね。
あなたが見つけたのは、あなたがあなたのことを見たときです。
それはあなたが注意を見てダンベルを言う場所に出力し、注意は主に腕を見ているようなものです。それは通常誰かが筋肉
誰があなたの写真の中でダンベルを上げていますか？
それはそれであり、それはそれが腕を見ていたのでこの種のマグカップのように見えるという事実を無効にしています
つまり、トランスと呼ばれるこのシステムはニューラルネットワークの一種です。
これは、以下のことに注意を払っている
最先端のパフォーマンスを生み出します。
彼らが学ぶことができる自然言語のコーパス
彼らは非常にうまくやることを学ぶことができます、彼らはあなたに彼らを与えます彼らは非常に強力な言語モデルであることができます
私たちはあなたの携帯電話に言語モデルの例を持っていました
これは非常に基本的なもので、それからニューラルネットワークでこれをやろうとすること、そして記憶することの問題のようなものです。
それで、あなたはそれらを追跡し続けるあなたが初めを覚えることができるようにあなたが一緒にメモリを渡すことを可能にする繰り返しのシステムが好きです
少なくともそれの終わりまでに文の

English: 
And it's not actually correct. It's like its owner trots and I go he's drinking some tea out of a mug, right and
what you find is then when you look at your
Outputs where it says dumbbell you look at the attention and the attention is like mostly looking at the arms. That's usually somebody muscular
Who's lifting the dumbbell in your photos?
It's and so it it's overriding the fact that this kind of looks like a mug because it was looking at the arms
So the idea is this system which is called a transformer is a type of neural network
which just relies very heavily on attention to
Produce like state-of-the-art performance and if you train them on a large
corpus of natural language they can learn
They can learn to do very well, right they give you they can be very powerful language models
We had the example of a language model on your phone
That's like a very very basic and then trying to do this with neural networks and the problems with remembering
And so you have like recurrent systems that keep track of they allow you to pass memory along so that you can remember the beginning
of the sentence at least by the end of it and

Turkish: 
And it's not actually correct. It's like its owner trots and I go he's drinking some tea out of a mug, right and
what you find is then when you look at your
Outputs where it says dumbbell you look at the attention and the attention is like mostly looking at the arms. That's usually somebody muscular
Who's lifting the dumbbell in your photos?
It's and so it it's overriding the fact that this kind of looks like a mug because it was looking at the arms
So the idea is this system which is called a transformer is a type of neural network
which just relies very heavily on attention to
Produce like state-of-the-art performance and if you train them on a large
corpus of natural language they can learn
They can learn to do very well, right they give you they can be very powerful language models
We had the example of a language model on your phone
That's like a very very basic and then trying to do this with neural networks and the problems with remembering
And so you have like recurrent systems that keep track of they allow you to pass memory along so that you can remember the beginning
of the sentence at least by the end of it and

Japanese: 
LSTMのようなもの人々がさまざまなことを試みることすべてがこれらのさまざまな種類があります
それは彼らが彼らがより長い期間を持つことができるそれがよりよくすることができるようにより良いそしてメモリに頼っています
依存性。これにより、より一貫性のあるものにすることができます。
アウトプット
一般的に優れた性能では、トランスは
その亜種はありますか？
あなたが本当に注意を向けているところでは、まさに別のやり方です。それで、これらは実際には繰り返し発生しません。
このようなことがないようにするための重要な違い
毎回出力を取り、入力としてフィードバックするなど
注目しているから。私達は大きな記憶を保つ必要はありません
システムがその部分を振り返るために注意を使うことができる何かを知りたいと思う時はいつも私達は走り抜けます
テキストを暗記するのとは違います。それは
テキストのさまざまな部分に注意を払う

Turkish: 
Things like LSTMs there is all these different varieties that people try different things
That are better and hanging on to memory so that they can do better it they can have longer term
Dependencies, which allows you to have more coherent
outputs
in just generally better performance, and then the transformer is
Is a variant on that?
Well is a different way of doing things where you really focus on attention. And so these are actually not recurrent which is an
important distinction to make we don't have this thing of like
Taking the output and feeding that back as the input and so on every time
Because we have attention. We don't need to keep a big memory
That we run through every time when the system wants to know something it can use its attention to look back to that part
It's not like memorizing the text as it goes. It's
paying attention to different bits of the text as

English: 
Things like LSTMs there is all these different varieties that people try different things
That are better and hanging on to memory so that they can do better it they can have longer term
Dependencies, which allows you to have more coherent
outputs
in just generally better performance, and then the transformer is
Is a variant on that?
Well is a different way of doing things where you really focus on attention. And so these are actually not recurrent which is an
important distinction to make we don't have this thing of like
Taking the output and feeding that back as the input and so on every time
Because we have attention. We don't need to keep a big memory
That we run through every time when the system wants to know something it can use its attention to look back to that part
It's not like memorizing the text as it goes. It's
paying attention to different bits of the text as

Turkish: 
they as it thinks that they're relevant to the bit that it's looking at now and
The thing about that is when you have this recurrent thing
It's kind of inherently serial
most of the calculations for this you can't do them until you have
The inputs and the inputs are the output of the previous network. And so
You can't do the thing that people like to do now, which is run it on a million computers
And get lightning-fast performance because you have to go through them in order right? It's like inherently serial
Where as transformers are much more parallelizable, which means you get better computational performance out of them as well?
Which is another
Selling point so they they work better and they run faster. So they're they're really a
Step up. So transformers. Are this really powerful
architecture.  They seem to give really good performance on this kind of sort of language modeling type tasks and
we
But what we didn't know really was how far you can push them or how how good they can get

Japanese: 
それは彼らが今見ていることに少し関連していると思うので彼らは
それについての事はあなたがこの繰り返しの事を持っている時です
それは一種の本質的にシリアルです
これのための計算のほとんどはあなたがするまでそれらをすることができません
入力と入力は、前のネットワークの出力です。など
あなたは人々が今やりたいことをすることはできません。
そして、あなたは正しい順序でそれらを通過しなければならないので、超高速パフォーマンスを手に入れますか？本質的にシリアルのようです
変圧器がはるかに並列化可能である場合、どこであなたはそれらから同様により良い計算性能を得ることを意味しますか？
他はどれですか
彼らは彼らがよりよく働き、彼らはより速く走るようにセールスポイント。だから彼らは本当に彼らです
ステップアップ変圧器です。これは本当に強力ですか
建築。彼らは、この種の言語モデリングタイプのタスクに対して非常に優れたパフォーマンスを発揮しているようです。
我々
しかし、私たちが本当に知らなかったのは、どれだけあなたがそれらをプッシュできるか、あるいはどれだけ彼らが得ることができるかでした。

English: 
they as it thinks that they're relevant to the bit that it's looking at now and
The thing about that is when you have this recurrent thing
It's kind of inherently serial
most of the calculations for this you can't do them until you have
The inputs and the inputs are the output of the previous network. And so
You can't do the thing that people like to do now, which is run it on a million computers
And get lightning-fast performance because you have to go through them in order right? It's like inherently serial
Where as transformers are much more parallelizable, which means you get better computational performance out of them as well?
Which is another
Selling point so they they work better and they run faster. So they're they're really a
Step up. So transformers. Are this really powerful
architecture.  They seem to give really good performance on this kind of sort of language modeling type tasks and
we
But what we didn't know really was how far you can push them or how how good they can get

Turkish: 
What happens if you take this architecture and you give it a bigger data set than any of them has ever been given and more?
Compute to train with, you know, a larger model with more parameters and more data
How good can these things get how how good a language model?
Can you actually make and that's what opening I was doing with GPT 2?
So an executable binary the net effect of slotting that T diagram against here slightly downwards is to show you
That the C you've written gets converted into binary and the net output from this
process it produces out a program that you probably store in a

English: 
What happens if you take this architecture and you give it a bigger data set than any of them has ever been given and more?
Compute to train with, you know, a larger model with more parameters and more data
How good can these things get how how good a language model?
Can you actually make and that's what opening I was doing with GPT 2?
So an executable binary the net effect of slotting that T diagram against here slightly downwards is to show you
That the C you've written gets converted into binary and the net output from this
process it produces out a program that you probably store in a

Japanese: 
このアーキテクチャーを採用し、これまでに挙げたどれよりも大きいデータセットをそれに与えるとどうなりますか。
あなたが知っているように、より多くのパラメータとより多くのデータを持つより大きなモデルでトレーニングするための計算
これらのことによって、言語モデルのパフォーマンスはどの程度向上するでしょうか。
あなたは実際に作ることができます、そしてそれは私がGPT 2で行っていた開口部ですか？
そのため、実行可能なバイナリでは、T図をここからやや下向きにスロットすることの正味の効果は、あなたにわかることです
あなたが書いたCがバイナリに変換され、そこから正味の出力が得られること
プロセスを作成すると、おそらく
