
English: 
People need to learn to use standardized measures for things. So take me
For example when I Drive anywhere I driving miles I Drive in miles per hour
My fuel economy is messaging miles per gallon, but of course, I don't pump fuel in my in gallons
I pump it in liters
And then but when I run anywhere so short distances I run in kilometres and I run in kilometers per hour
So I'm using two different systems there and any short distances. I'm measuring are going to be in meat. It's not feet, right
so if I'm measuring let's say
around my house for painting I'm going to measure in square meters so I know how much paint to buy but then
If I'm selling a house, or I'm buying a house
I'm going to be looking at the size of the house in square feet again. What who knows why British people?
If I'm baking anything, it's going to be weight in grams or kilograms going into the recipe
but if I'm weighing myself is going to be in stones and
pounds but of course a ton would for me would be a metric tonne not an imperial time and

Japanese: 
人々は物事のために標準化された尺度を使うことを学ぶ必要があります。だから私をください
例えば、私がどこにでも運転しているとき私はマイルを運転している私は時速マイルで運転する
私の燃費は1ガロンあたりマイルのメッセージングですが、もちろん、私はガロンで私の燃料を汲み上げません
それをリットルで送ります
それからしかし、私がどこでも走るとき、私はキロメートルで走り、時速キロメートルで走ります
だから私はそこに2つの異なるシステムとどんな短距離でも使っています。私が測定しているのは肉になるだろう。足ではない、正しい
だから私が測定しているなら、言ってみよう
私の家の周りの絵のために私は平方メートルで測るつもりですので私はどれだけの量の塗料を買うべきか知っています
家を売っている場合、または家を買っている場合
私は再び家の大きさを平方フィートで見ようとしています。なぜ英国の人々を知っているのでしょうか。
私が何かを焼いているならば、それはレシピに入るグラムまたはキログラムの重さになるだろう
しかし、私が自分の体重を量るのであれば、石の中にいることになります。
ポンドだがもちろん私にとってトンはメートルトンになるだろうし、帝国時代ではないだろう

Turkish: 
İnsanların şeyler için standart önlemler almayı öğrenmesi gerekir. Öyleyse al beni
Mesela her yerde araba sürdüğümde mil kullanıyorum
Yakıt ekonomim galon başına mil atıyor, ama elbette galondaki yakıtı pompalamıyorum
Litre pompalarım
Ve sonra kısa mesafelerde herhangi bir yere koştuğumda kilometre cinsinden ve saatte kilometre cinsinden koşuyorum
Bu yüzden orada iki farklı sistemler ve kısa mesafeler kullanıyorum. Ölçüyorum ette olacak. Ayak değil, doğru.
öyleyse ölçüyorsam diyelim
Resim için evimin etrafında metrekare olarak ölçeceğim bu yüzden ne kadar boya alacağımı biliyorum.
Bir ev satıyorsam veya bir ev satın alıyorsam
Evin büyüklüğüne yine metrekare olarak bakacağım. Neden İngiliz halkını kim bilir?
Eğer bir şey pişiriyorsam, tarife giren gram veya kilogram ağırlık olacak.
ama eğer tartıyorsam kendim taş olacak ve
İngiliz Sterlini, elbette ki bir tonu benim için emperyal bir zaman değil bir metrik ton olacaktı.

English: 
People need to learn to use standardized measures for things. So take me
For example when I Drive anywhere I driving miles I Drive in miles per hour
My fuel economy is messaging miles per gallon, but of course, I don't pump fuel in my in gallons
I pump it in liters
And then but when I run anywhere so short distances I run in kilometres and I run in kilometers per hour
So I'm using two different systems there and any short distances. I'm measuring are going to be in meat. It's not feet, right
so if I'm measuring let's say
around my house for painting I'm going to measure in square meters so I know how much paint to buy but then
If I'm selling a house, or I'm buying a house
I'm going to be looking at the size of the house in square feet again. What who knows why British people?
If I'm baking anything, it's going to be weight in grams or kilograms going into the recipe
but if I'm weighing myself is going to be in stones and
pounds but of course a ton would for me would be a metric tonne not an imperial time and

English: 
As I said, I measure fuel in litres and most of my liquids are measured in liters except for coarse for beer and milk
Which are in pints? So this is the kind of problem
You're going to be dealing with when you're looking at data. You're trying to transform your data into a usable form
Maybe the data is coming from different sources
None of it goes together. You need standardized units standardized scales so we can go on and analyze it
Let's think back
we what we're doing is we're trying to prepare our data into a
Densest most clean format so that we can apply modeling or machine learning or some kind of statistical
Test to work out what's going on and draw knowledge from our data. So this is going to be an iterative process
We're going to be cleaning the data
We're going to transform the data and then we're going to reduce for data and transforming data is what we're going to do today
So let's imagine that you've cleaned your data. So we've got rid of as many missing variables as possible
Hopefully all of them with deleted instances and attributes that just we're not going to work out for us
Now what we're going to try and do is we're going to try and transform our data so that everything's on the same scale
Everything makes sense together and if we're bringing datasets from different places

Japanese: 
私が言ったように、私はリットルで燃料を測定します、そして、私の液体のほとんどはビールとミルクのための粗粒を除いてリットルで測定されます
どちらがパイントですか？だからこれは一種の問題です
あなたがデータを見ている時にあなたは対処しようとしているのです。データを使用可能な形式に変換しようとしています
多分データはさまざまなソースから来ている
それのどれも一緒に行きません。標準化された単位の標準化されたスケールが必要です。
振り返ってみましょう
私たちがしているのは、データを
モデリングや機械学習、あるいは何らかの統計的手法を適用できるように、最も緻密な最もクリーンな形式
何が起こっているのかを調べ、データから知識を引き出すためにテストします。だからこれは反復的なプロセスになるだろう
私たちはデータをきれいにするつもりです
データを変換してからデータを減らし、データを変換することが今日の作業です。
それで、あなたがあなたのデータをきれいにしたと想像しましょう。だから我々はできるだけ多くの不足している変数を取り除きました
うまくいけば、それらすべてに削除されたインスタンスと属性があります。
今、私たちがやろうとしているのは、すべてが同じスケールになるようにデータを試して変換することです。
すべてが理にかなっていると私たちは別の場所からデータセットを持っている場合

English: 
As I said, I measure fuel in litres and most of my liquids are measured in liters except for coarse for beer and milk
Which are in pints? So this is the kind of problem
You're going to be dealing with when you're looking at data. You're trying to transform your data into a usable form
Maybe the data is coming from different sources
None of it goes together. You need standardized units standardized scales so we can go on and analyze it
Let's think back
we what we're doing is we're trying to prepare our data into a
Densest most clean format so that we can apply modeling or machine learning or some kind of statistical
Test to work out what's going on and draw knowledge from our data. So this is going to be an iterative process
We're going to be cleaning the data
We're going to transform the data and then we're going to reduce for data and transforming data is what we're going to do today
So let's imagine that you've cleaned your data. So we've got rid of as many missing variables as possible
Hopefully all of them with deleted instances and attributes that just we're not going to work out for us
Now what we're going to try and do is we're going to try and transform our data so that everything's on the same scale
Everything makes sense together and if we're bringing datasets from different places

Turkish: 
Dediğim gibi, litre yakıtı ölçtüm ve sıvılarımın çoğu bira ve süt için kaba hariç litre olarak ölçülüyor
Hangileri pint içinde? Demek sorun bu
Verilere bakarken uğraşacaksınız. Verilerinizi kullanılabilir bir forma dönüştürmeye çalışıyorsunuz
Belki de veriler farklı kaynaklardan geliyor.
Hiçbiri bir araya gelmiyor. Standart birimlere standartlaştırılmış ölçeklere ihtiyacınız var, böylece devam edebilir ve analiz edebiliriz
Geri düşünelim
Yaptığımız şey, verilerimizi bir
Modelleme veya makine öğrenimi ya da bir çeşit istatistiği uygulayabilmemiz için en yoğun format
Neler olduğunu araştırmak için test edin ve verilerimizden bilgi çekin. Yani bu yinelemeli bir süreç olacak
Verileri temizliyor olacağız
Verileri dönüştüreceğiz ve sonra verileri azaltacağız ve verileri dönüştüreceğiz bugün yapacağımız şey
Öyleyse, verilerinizi temizlediğinizi hayal edelim. Bu yüzden mümkün olduğu kadar çok eksik değişkenlerden kurtulduk.
İnşallah hepsi de bizim için çözemeyeceğimiz silinmiş örnekleri ve öznitelikleri olan
Şimdi deneyecek ve yapacağımız şey, verilerimizi değiştirmeye çalışacağımız için her şey aynı ölçekte olacak.
Her şey birlikte mantıklı geliyor ve eğer farklı yerlerden veri setleri getiriyorsak

English: 
We need to also make sure what the units are the same and everything makes sense
There's no point in trying to use machine learning or sum or clustering or any other mechanism
To draw knowledge from our data if our data is is all wrong
So today we're going to be looking at census data now census data is kind of a classic example of a kind of data you
Might look at in data analysis. It's got lots of different kinds of attributes things that are going to need cleaning up and transforming
So we're back in our we're going to read the census data using census is read CSV
So we've downloaded some census data that represents samples from the US population
To begin with we're going to read that in and you can see that we've got 32,000 observations and 15 attributes or variables
So what are the first timers so let's have a quick look at just a little bit of it and we can see the kind
Of thing. We're looking at so we're going to say head of census and that's just going to produce the first few rows
So we can kind of see the kind of data so you can see we've got age
we've got what working classification that person has their educational level a

English: 
We need to also make sure what the units are the same and everything makes sense
There's no point in trying to use machine learning or sum or clustering or any other mechanism
To draw knowledge from our data if our data is is all wrong
So today we're going to be looking at census data now census data is kind of a classic example of a kind of data you
Might look at in data analysis. It's got lots of different kinds of attributes things that are going to need cleaning up and transforming
So we're back in our we're going to read the census data using census is read CSV
So we've downloaded some census data that represents samples from the US population
To begin with we're going to read that in and you can see that we've got 32,000 observations and 15 attributes or variables
So what are the first timers so let's have a quick look at just a little bit of it and we can see the kind
Of thing. We're looking at so we're going to say head of census and that's just going to produce the first few rows
So we can kind of see the kind of data so you can see we've got age
we've got what working classification that person has their educational level a

Japanese: 
また、単位が同じですべてが理にかなっていることを確認する必要があります。
機械学習、合計、クラスタリング、その他のメカニズムを使用しても意味がありません。
データがすべて間違っている場合にデータから知識を引き出す
だから今日は国勢調査データを見ているつもりです今は国勢調査データは一種のデータの一種の古典的な例です
データ分析で見るかもしれません。クリーンアップと変換が必要になると思われるさまざまな種類の属性があります。
だから私たちは戻ってきた私たちは国勢調査を使用して国勢調査データを読むつもりですCSVを読む
米国の人口からのサンプルを表す国勢調査データをダウンロードしました。
まず始めにそれを読んでみると、32,000の観測値と15の属性または変数があることがわかります。
それでは、初めてのタイマーは何ですか。そのほんの少しだけを簡単に見てみましょう。
ことの我々は見ているので、我々は国勢調査の頭を言うつもりだし、それはちょうど最初の数行を生成するつもりです
だから私達は一種のデータを見ることができるのであなたは私達が年をとったのを見ることができる
その人の職業分類が彼らの教育レベルを持っているか

Turkish: 
Birimlerin aynı olup olmadığından ve her şeyin mantıklı olduğundan da emin olmalıyız.
Makine öğrenmeyi, toplamı veya kümelemeyi veya başka herhangi bir mekanizmayı kullanmayı denemenin bir anlamı yok
Verilerimiz yanlışsa verilerimizden bilgi çekmek
Bugün nüfus sayımı verilerine bakıyor olacağız, şimdi nüfus sayımı verileri bir tür verinin klasik bir örneği.
Veri analizine bakabilir. Temizlenmesi ve dönüştürülmesi gereken şeylerin birçok farklı özelliği var.
Bu yüzden geri döndüğümüzde sayım verilerini okuyacağız sayım kullanarak CSV okuruz
Bu yüzden ABD popülasyonundan örnekleri temsil eden bazı nüfus sayımı verilerini indirdik.
Başlamak için şunu okuyacağız ve 32.000 gözlem ve 15 özellik veya değişkenimiz olduğunu görebilirsiniz.
Öyleyse ilk zamanlayıcılar hangileridir? Haydi birazına bir göz atalım ve türünü görebiliriz.
Şey Bakıyoruz, bu yüzden nüfus sayımı başkanı diyeceğiz ve bu sadece ilk birkaç satırı üretecek.
Böylece bir çeşit veriyi görebiliriz, böylece yaşlandığımızı görebilirsiniz
bu kişinin eğitim seviyesine sahip hangi işçi sınıflandırmalarına sahibiz?

English: 
Numerical representation about whether they're married or not this kind of thing
So there's a lot of different kinds of data here some of its going to be nominal
So for example, this working-class state government private employee. That's a nominal value
We might have ordinal values or ratio values or interval values
All right
We're gonna have to delve in a little bit closer to find out what these are now
What we do to transform this data into a usable format for clustering or machine learning
It's going to depend on exactly what these types of these columns are and what we want to do with them
So let's look at it just a couple of the attributes and see what we can do with them, right?
we're going to use a process called codification the idea is that may be things like random forests or
Multi-layer perceptrons, you know neural networks aren't going to be very amenable to putting in text-based inputs
And what we want to do is try and replace these attributes with a numerical score
All right
So let's look at just for example of a working class and also for example
The educational level so education now work class is the kind of class of worker that we're looking at here

English: 
Numerical representation about whether they're married or not this kind of thing
So there's a lot of different kinds of data here some of its going to be nominal
So for example, this working-class state government private employee. That's a nominal value
We might have ordinal values or ratio values or interval values
All right
We're gonna have to delve in a little bit closer to find out what these are now
What we do to transform this data into a usable format for clustering or machine learning
It's going to depend on exactly what these types of these columns are and what we want to do with them
So let's look at it just a couple of the attributes and see what we can do with them, right?
we're going to use a process called codification the idea is that may be things like random forests or
Multi-layer perceptrons, you know neural networks aren't going to be very amenable to putting in text-based inputs
And what we want to do is try and replace these attributes with a numerical score
All right
So let's look at just for example of a working class and also for example
The educational level so education now work class is the kind of class of worker that we're looking at here

Turkish: 
Bu tür şeylerin evli olup olmadıklarına dair sayısal gösterimler
Dolayısıyla, burada bazılarının nominal olacağı pek çok farklı türde veri var.
Mesela, bu işçi sınıfı devlet hükümeti özel çalışanı. Bu nominal bir değer
Sıra değerlerimiz veya oran değerlerimiz veya aralık değerlerimiz olabilir.
Tamam
Bunların ne olduğunu bulmak için biraz daha yaklaşmamız gerekecek.
 
Bu verileri kümeleme veya makine öğrenmesi için kullanılabilir bir formata dönüştürmek için ne yapıyoruz
Bu tür sütunların tam olarak ne olduğuna ve onlarla ne yapmak istediğimize bağlı olacak.
Öyleyse şuna birkaç özellik bakalım ve onlarla ne yapabileceğimize bakalım, değil mi?
Kodlama adı verilen bir işlemi kullanacağız. Bu, rastgele ormanlar gibi şeyler olabileceği fikri.
Çok katmanlı algılayıcılar, sinir ağlarının metne dayalı girdiler koymak için uygun olmayacağını biliyorsunuz
Ve yapmak istediğimiz, bu nitelikleri sayısal bir puanla değiştirmeye çalışmak.
Tamam
Öyleyse, sadece bir işçi sınıfı örneğine ve ayrıca örneğin
Eğitim seviyesi yani eğitim artık işçi sınıfı, burada baktığımız işçi sınıfı türüdür

Japanese: 
結婚しているかどうかについての数値表現
したがって、ここにはさまざまな種類のデータがたくさんありますが、その一部は名目上のものになるでしょう
だから、たとえば、この労働者階級の州政府の民間従業員。それは名目値です
順序値、比率値、または間隔値があります。
大丈夫
これらが今どうなっているのかを知るためには、もう少し詳しく説明する必要があります。
 
このデータをクラスタリングや機械学習に使えるフォーマットに変換するために何をするか
それはこれらの列のこれらのタイプが何であるか、そして我々がそれらに対して何をしたいのかに正確に依存するでしょう。
それでは、ほんの2、3の属性だけを見て、それらを使って何ができるかを見てみましょう。
私たちは成文化と呼ばれるプロセスを使うつもりです。それは、ランダムな森林や
多層パーセプトロン、あなたはニューラルネットワークがテキストベースの入力を入れることに非常に従順になることはないだろうということをあなたは知っています
そして私たちがやりたいことは、これらの属性を数値スコアで置き換えてみることです。
大丈夫
それでは、ワーキングクラスの例を見てみましょう。
教育レベルで、教育は現在労働者階級であるということは、ここで見ている労働者の一種です。

Japanese: 
たとえば、州の労働者や民間部門、あるいは学校で働いている人など
これは公称値です。つまり、このデータにはまったく順序がありません。
私たちは言うことはできませんが、国家の誰かが個人の誰かよりも高いか低いか、また言うことはできませんが、言わせてください
状態は他のものの2倍以上または2倍以下です。それはまったく意味がありません
それでは、これを数字に置き換えることはできません。
それでは、privateを0に、stateを1に置き換えて、
自営業者が2人で正しいことを知っているので、その週はそれを行うのが合理的なことですが、それでも名目上のデータです
だから私たちができないのはそれから平均値を計算することです
何かが数字のスコアに置き換えられたからといって意味がわからないという意味ではありません。
それが実際にそのように定量化できるものを表しているという意味ではありませんか。まだ名目データです
さて、それで私が与えることができる最もよいアドバイスはあなたのデータを読みやすい数字に体系化すること自由に感じなさいということです
ただし、モードは次のように計算できます。
あなたは最も一般的なことを知っていますが、あなたは中央値を計算することができないし、あなたは平均を計算することができません。

English: 
So for example a state worker or in private sector or someone that worked in a school or something like this now
This is a nominal value. That means there's no order to this data at all
we can't say but someone in state is higher or lower than someone in private and we can't also say but let's say
State is two times more or less than some other one. That makes no sense at all
So what we can't we can replace this with numbers?
so let's say we could replace private with zero and state with one and
You know self-employed with two and so on right and that week that's perfectly reasonable thing to do, but it's still nominal data
so what we can't do is then calculate a mean and
Say are the mean is halfway between private and public that doesn't make any sense just because something has been replaced by a numerical score
Doesn't mean that it actually represents something that we can quantify in that way right? It's still nominal data
Okay, so I bet the best advice I can give is feel free to codify your data into easy-to-read numbers
but just bear in mind that you can calculate the mode just like
you know the most common but you can't calculate the median and you can't calculate the mean another example would be something like the

Turkish: 
Örneğin, bir devlet işçisi veya özel sektörde veya bir okulda çalışan bir kişi ya da şimdi böyle bir şey
Bu nominal bir değerdir. Bu, bu verilere hiçbir sipariş verilmediği anlamına gelir.
Söyleyemem ama eyalette bir kişi özel birinden daha yüksek veya daha düşük ve biz de söyleyemem
Devlet, diğerinden iki kat daha fazla ya da daha azdır. Bu hiç mantıklı gelmiyor
Peki, bunu rakamlarla değiştiremez miyiz?
öyleyse diyelim ki özel sıfırla değiştirelim ve birle değiştirelim
İki kişi ve diğerleri ile serbest meslek sahibi olduğunuzu biliyorsunuz ve o hafta yapılacak makul bir şey, ama yine de nominal veri
yapamayacağımız şey bir ortalama hesaplamak ve
Diyelim ki, bir şey sayısal bir puanla değiştirildiğinden, ortalamanın özel ve kamu arasında hiçbir anlamı yoktur.
Bu, aslında bu şekilde ölçebileceğimiz bir şeyi temsil ettiği anlamına gelmiyor mu? Hala nominal veri
Tamam, bu yüzden verebileceğim en iyi tavsiyeye göre, verilerinizi okunması kolay numaralara kodlamaktan çekinmeyin
ama sadece bu gibi modda hesaplayabilirsiniz aklınızda bulundurun
en yaygın olanı biliyorsunuz ama medyanı hesaplayamıyorsunuz ve ortalamayı hesaplayamıyorsunuz.

English: 
So for example a state worker or in private sector or someone that worked in a school or something like this now
This is a nominal value. That means there's no order to this data at all
we can't say but someone in state is higher or lower than someone in private and we can't also say but let's say
State is two times more or less than some other one. That makes no sense at all
So what we can't we can replace this with numbers?
so let's say we could replace private with zero and state with one and
You know self-employed with two and so on right and that week that's perfectly reasonable thing to do, but it's still nominal data
so what we can't do is then calculate a mean and
Say are the mean is halfway between private and public that doesn't make any sense just because something has been replaced by a numerical score
Doesn't mean that it actually represents something that we can quantify in that way right? It's still nominal data
Okay, so I bet the best advice I can give is feel free to codify your data into easy-to-read numbers
but just bear in mind that you can calculate the mode just like
you know the most common but you can't calculate the median and you can't calculate the mean another example would be something like the

Turkish: 
Eğitim seviyesi şimdi bana bunun sıradan veriler olduğunu söylemekten korkuyor, böylece lisans derecesine sahip birini kurtarabiliriz.
Belki zamanları bakımından biraz daha yüksek olabilir. Eğitimde geçirdiler ama lise diplomasına sahip biri
Ama tam olarak mesafenin ne olduğunu bilmiyoruz.
Ve bir derece ne zaman bir doktora ne zaman bir lise diyelim arasındaki mesafe nedir?
Ve böylece bir MD ve bunun gibi şeyler
Bunları temsil edebiliriz
Sayıları ve muhtemelen sıralamayı kullanarak sıfırın hayır olduğunu söyleyebiliriz.
Eğitim ve birincisi ilkokulun sonu ve iki tanesi de lise sonu vb.
Fakat yine de, bunlar arasındaki mesafeleri hesaplamak zor
Hangi lisenin ilkokuldan iki kat fazla olduğunu ve derecenin yarısını veya bunun gibi bir şey olduğunu bilmiyoruz.
Bu gerçekten mantıklı değil
Yani, yine de bu mod veya bir medyanı hesaplamak mümkün olabilir, ancak bir ortalama hesaplayamazsınız
Ortalama okation seviyesini söyleyemezsiniz. Hem lise hem de lisans arasında bir anlam ifade etmiyor.
Dolayısıyla, nominal veya muhtemelen sıralı olan herhangi bir nitelik için ve metin kullanılarak temsil edilir.

Japanese: 
教育レベルで、これが序数データであることを恐れているため、学部課程の学位で誰かに保存することができます
それは彼らの時間の長さの点で多分少し高いです。彼らは教育に費やしました、しかし高校の卒業証書を持つ誰か
しかし、距離は正確にはわかりません
それでは、学位を取得してから博士号を取得したときの高校までの距離を教えてください。
そして、MDなどでも
これらを表すことができます
数字を使用して、おそらく正しい順序で、ゼロはノーと言えるでしょう
教育と1つは小学校の終わりのようなもので、2つは高校の終わりなどです。
しかし、やはり、これらの間の距離を計算するのは困難です。
私たちは、高校が小学校の2倍で、学位の半分以上であるかどうかはわかりません。
それは実際には意味がありません
繰り返しますが、このモードまたはモードで中央値を計算することはできますが、平均値を計算することはできません。
あなたは平均的な職業レベルを言うことはできません。高校と学部生の中間でも意味がありません。
それで、名目上またはおそらく序数上のあらゆる種類の属性に対して、それはテキストを使用して表されるようなものです。

English: 
Educational level now fear letting me this is ordinal data so we could save it someone with a an undergraduate degree
It's maybe slightly higher in terms of their the amount of time. They spent in education, but someone with a high school diploma
But we don't know exactly what the distance is
And what's the distance between let's say a high school when a degree and then a PhD?
And so on an MD and things like this
We can represent these
Using numbers and probably in order right so we could say that zero is no
Education and one is sort of the end of primary school and two is the end of high school and so on and so forth
But again, it's difficult to calculate distances between these things
We don't know what high school is two times more than primary school and half of a degree or something like that
That doesn't really make sense
So again, you might be able to calculate a median on this or a mode, but you can't calculate an average
You can't say the average level of ocation. It's halfway between high school and undergraduate that doesn't make any sense either
So for any kind of attribute that is nominal or possibly ordinal and it's sort of represented using text

English: 
Educational level now fear letting me this is ordinal data so we could save it someone with a an undergraduate degree
It's maybe slightly higher in terms of their the amount of time. They spent in education, but someone with a high school diploma
But we don't know exactly what the distance is
And what's the distance between let's say a high school when a degree and then a PhD?
And so on an MD and things like this
We can represent these
Using numbers and probably in order right so we could say that zero is no
Education and one is sort of the end of primary school and two is the end of high school and so on and so forth
But again, it's difficult to calculate distances between these things
We don't know what high school is two times more than primary school and half of a degree or something like that
That doesn't really make sense
So again, you might be able to calculate a median on this or a mode, but you can't calculate an average
You can't say the average level of ocation. It's halfway between high school and undergraduate that doesn't make any sense either
So for any kind of attribute that is nominal or possibly ordinal and it's sort of represented using text

English: 
We can codify this so but it's more amenable to things like decision trees depending on the library you're using right?
But you just have to be careful all machine learning
Algorithms will take any number you give them and you just have to be careful that this makes sense to do
So what you would do is you would go through your data and you'd begin to systematically replace appropriate attributes with numerical versions of themselves
Remembering all the time, but they don't necessarily represent true numbers, you know in a ratio or interval format
So for any text-based value, we're going to start with places and possibly with numerical scores. What about the numerical values?
Well, they might be okay, but the issue is going to be one of scale
you might find for example in this census data that one of the
Dimensions or one of the attributes is much much larger than another one. So for example, this data set has hours per week
which is obviously going to be somewhere between naught and maybe 60 or 70 hours for someone that's got
you know a very strong work ethic and
Salary right or salary or income or any other measure of you know?
monetary gain now obviously hours per week is going to be in the tens and

English: 
We can codify this so but it's more amenable to things like decision trees depending on the library you're using right?
But you just have to be careful all machine learning
Algorithms will take any number you give them and you just have to be careful that this makes sense to do
So what you would do is you would go through your data and you'd begin to systematically replace appropriate attributes with numerical versions of themselves
Remembering all the time, but they don't necessarily represent true numbers, you know in a ratio or interval format
So for any text-based value, we're going to start with places and possibly with numerical scores. What about the numerical values?
Well, they might be okay, but the issue is going to be one of scale
you might find for example in this census data that one of the
Dimensions or one of the attributes is much much larger than another one. So for example, this data set has hours per week
which is obviously going to be somewhere between naught and maybe 60 or 70 hours for someone that's got
you know a very strong work ethic and
Salary right or salary or income or any other measure of you know?
monetary gain now obviously hours per week is going to be in the tens and

Turkish: 
Bunu kodlayabiliriz, ancak doğru kullandığınız kütüphaneye bağlı olarak karar ağaçları gibi şeyler için daha uygun mu?
Ama tüm makine öğrenmeye dikkat etmelisin.
Algoritmalar onlara verdiğiniz herhangi bir numarayı alacak ve bunun yapılması mantıklı olduğuna dikkat etmeniz gerekiyor.
Öyleyse, yapacağınız şey verilerinizi gözden geçirmek ve uygun niteliklerin kendilerinin sayısal versiyonları ile sistematik olarak değiştirilmeye başlamasıdır
Her zaman hatırlamak, ancak mutlaka gerçek sayıları temsil etmiyorlar, bir oran veya aralık biçiminde biliyorsunuz
Bu nedenle, herhangi bir metne dayalı değer için, yerlerle ve muhtemelen sayısal puanlarla başlayacağız. Sayısal değerler ne olacak?
Pekala, iyi olabilirler, ama mesele ölçeklerden biri olacak
Örneğin, bu nüfus sayımı verilerinde
Boyutlar veya niteliklerden biri diğerinden çok daha büyük. Örneğin, bu veri kümesinin haftada bir saati vardır.
ki belli ki doğuramamış biri ile belki 60 ya da 70 saat arasında bir yerde olacak.
çok güçlü bir çalışma etiği biliyorsunuz ve
Maaş hakkı veya maaş veya gelir veya başka bir ölçünüzü biliyor musunuz?
Parasal kazanç şimdi açıkçası haftada saatlerce onlarca olacak ve

Japanese: 
これを体系化することはできますが、使用しているライブラリに応じて、ディシジョンツリーのようなものに適していますか。
しかし、あなたはただすべての機械学習に注意しなければなりません
アルゴリズムはあなたがそれらを与えるどんな数でも取るでしょう、そしてあなたはこれがするのが理にかなっていることに注意しなければなりません
だからあなたがすることはあなたがあなたのデータを調べていくことであり、あなたは体系的に適切な属性をそれ自身の数値バージョンに置き換え始めるでしょう。
常に覚えているが、それらは必ずしも本当の数を表すわけではない、あなたは比率または間隔の形式で知っている
それで、どんなテキストベースの値でも、私たちは場所から、そしておそらく数値のスコアから始めるつもりです。数値はどうですか？
まあ、彼らは大丈夫かもしれませんが、問題は規模の一つになるだろう
たとえば、この国勢調査データでは、
寸法または属性の1つが他のものよりはるかに大きいです。したがって、たとえば、このデータセットには1週間あたりの時間数があります。
これは明らかに誰かが持っている人にとっては無駄な時間と60〜70時間の間のどこかになるだろう
あなたは非常に強い仕事倫理を知っています、そして
給与の権利、給与、収入、またはその他のあなたが知っている基準はどれですか。
金銭的な利益は今や明らかに週あたりの時間は数十年になるだろうと

English: 
Salary could be into the tens of thousands. Maybe even the hundreds of thousands
Those scales are not even close to being the same. That means if you're doing clustering or machine learning on this kind of data
You're going to be finding the salary is kind of overbearing everything, right?
So it's going to be very easy for your clustering to find differences in salary and it's harder for it to spot differences in hours
Because they're so small in comparison
Right. So we need to start to bring everything onto the same scale the more attributes you have which is another way of saying the
More dimensions you have to your data
Then the further everything is going to be spread around if we can scale all of these values to between sort of let's say around
0 & 1 then everything gets more tightly sort of controlled in the middle
And so it gets much easier to do
Clustering or machine learning or any kind of analysis we want
So let's look back at our data and see what we can do to try and scale some of this into the right range
So we're going to look back at the head of our data again
so our numerical values are things like the capital gain the capital loss which I guess Zuma bleah how much money they've made in the

English: 
Salary could be into the tens of thousands. Maybe even the hundreds of thousands
Those scales are not even close to being the same. That means if you're doing clustering or machine learning on this kind of data
You're going to be finding the salary is kind of overbearing everything, right?
So it's going to be very easy for your clustering to find differences in salary and it's harder for it to spot differences in hours
Because they're so small in comparison
Right. So we need to start to bring everything onto the same scale the more attributes you have which is another way of saying the
More dimensions you have to your data
Then the further everything is going to be spread around if we can scale all of these values to between sort of let's say around
0 & 1 then everything gets more tightly sort of controlled in the middle
And so it gets much easier to do
Clustering or machine learning or any kind of analysis we want
So let's look back at our data and see what we can do to try and scale some of this into the right range
So we're going to look back at the head of our data again
so our numerical values are things like the capital gain the capital loss which I guess Zuma bleah how much money they've made in the

Japanese: 
給料は数万人になる可能性があります。たぶん何十万も
これらのスケールは同じであることすらありません。これは、この種のデータに対してクラスタリングまたは機械学習を実行している場合です。
あなたは給料がすべてを惜しまないようなものだと思っているのですよね？
それで、あなたのクラスタリングが給料の違いを見つけることはとても簡単になるでしょう、そしてそれが時間の違いを見つけるのは難しいです。
それらは比較するととても小さいから
右。だから私たちはあなたが持っているより多くの属性を同じスケールにすべてを持ってくることを始める必要があります。
あなたのデータに持っているより多くの次元
それでは、これらすべての値を次のようなスケールに変換することができれば、さらにすべてが広がることになります。
0＆1それからすべてが中央でより厳密に制御されたようになる
それで、それははるかに簡単になります
クラスタリングや機械学習、あるいは私たちが望むあらゆる種類の分析
それでは、私たちのデータを振り返り、これを試して正しい範囲にスケールするために何ができるかを見てみましょう。
それでは、データの先頭をもう一度振り返ってみましょう。
そのため、私たちの数値は、キャピタルゲインからキャピタルロスのようなものです。

Turkish: 
Maaş onbinlerce olabilir. Belki yüz binlerce
Bu ölçekler aynı olmaya bile yakın değil. Bu, bu tür veriler üzerinde kümeleme veya makine öğrenmesi yapıyor olmanız anlamına gelir
Maaşın bir çeşit zorba olduğunu bulmak olacak, değil mi?
Dolayısıyla, kümelemenizin maaşta farklılıklar bulması çok kolay olacak ve saatlerce farkları tespit etmesi daha zor olacak
Çünkü onlar kıyaslandığında çok küçükler.
Sağ. Bu yüzden herşeyi aynı skalaya getirmeye başlamalıyız.
Verilerinizde sahip olduğunuz diğer boyutlar
Öyleyse, bu değerlerin hepsini söyleyene kadar ölçebilirsek, her şey etrafa yayılacak.
0 & 1 sonra ortada her şey daha sıkı bir şekilde kontrol altına alınır
Ve böylece yapmak çok daha kolaylaşıyor
Kümeleme veya makine öğrenmesi veya istediğimiz her türlü analiz
Öyleyse verilerimize bakalım ve bir kısmını doğru aralığa ölçeklendirmek için neler yapabileceğimize bakalım.
O zaman tekrar verilerimizin başına bakacağız.
bu yüzden sayısal değerlerimiz sermaye gibi kazanılan şeylerdir. Zuma'nın sanırım ne kadar para kazandıklarını tahmin ediyorum.

English: 
Loss that year probably for normal license on some scale
and then things like the hours per week that they work and their salary which at this case is rate of an or less than
50,000. So let's have a quick look at the kind of range of values
We're looking at here so we can see if scalings even necessary
Maybe we got lucky and the person did it before they sent us the data
So we're going to apply a function across all the columns and we're going to calculate the range of the data
So this is going to be apply on a census data division, too
So that's all of our columns and we're going to use the range function for this and this is going to tell us okay
So for example the age ranges from 17 to 90 the educational level from 1 to 16
It gives you the range for things like nominal values as well, but they don't really make any sense
I mean working class ranges from question mark to without pay, you know is meaningless and then so for example capital gain
ranges from zero to nearly one hundred
Thousand and capital loss from zero
To four thousand and finally the hours per week main gist from 1 to 99

Japanese: 
その年の喪失
そして、彼らが働いている1週間あたりの時間数や、この場合の給与額のようなものは、1かそれ以下の率です。
50,000それでは、値の範囲の種類を簡単に見てみましょう。
ここで見ているので、スケーリングがさらに必要かどうかを見ることができます。
多分私達はラッキーになり、彼らが私達にデータを送った前に人はそれをしました
そのため、すべての列に関数を適用してデータの範囲を計算します。
だから、これは国勢調査データ部にもあてはまるでしょう
これですべてのコラムが完成しました。これにはrange関数を使用します。これで問題ありません。
したがって、たとえば、年齢は17〜90歳、教育レベルは1〜16歳です。
それは同様にあなたに名目値のようなもののための範囲を与えます、しかしそれらは本当に意味がありません
私は労働者階級が疑問符から無給までの範囲であることを意味します、あなたは無意味であることを知っています、そしてそれから例えばキャピタルゲイン
0からほぼ100までの範囲
ゼロからの千と資本の損失
1から99までの週4時間、そして最終的には主な要点

Turkish: 
Muhtemelen bir ölçekte normal lisans için o yıl zarar
ve sonra haftada çalıştıkları saatler ve bu durumda bir veya daha az olan maaşları gibi şeyler
50,000. Öyleyse değerlerin çeşitlerine bir göz atalım
Buraya bakıyoruz, böylece ölçeklendirmelerin gerekli olup olmadığını görebiliriz.
Belki şanslıyız ve kişi bize verileri göndermeden önce yaptı.
Böylece tüm sütunlara bir fonksiyon uygulayacağız ve veri aralığını hesaplayacağız
Yani bu bir nüfus sayımı veri bölümüne de uygulanacak.
Demek bütün sütunlarımız bu ve bunun için range işlevini kullanacağız ve bu bize tamam diyecek
Örneğin, yaş 17 ile 90 arasında değişmekte ve eğitim seviyesi 1 ile 16 arasında değişmektedir.
Nominal değerler gibi şeyler için de size bir menzil sunar, ancak gerçekten bir anlam ifade etmezler.
Demek istediğim, işçi sınıfı soru işaretinden ücretsize kadar uzanıyor, yani anlamsız olduğunu biliyorsunuz ve sonra sermaye kazancı gibi
sıfır ila yaklaşık yüz arasında değişmektedir
Binlerce ve sıfırdan sermaye kaybı
Dört bine ve nihayet haftanın saatlerinde ana, 1'den 99'a kadar

English: 
Loss that year probably for normal license on some scale
and then things like the hours per week that they work and their salary which at this case is rate of an or less than
50,000. So let's have a quick look at the kind of range of values
We're looking at here so we can see if scalings even necessary
Maybe we got lucky and the person did it before they sent us the data
So we're going to apply a function across all the columns and we're going to calculate the range of the data
So this is going to be apply on a census data division, too
So that's all of our columns and we're going to use the range function for this and this is going to tell us okay
So for example the age ranges from 17 to 90 the educational level from 1 to 16
It gives you the range for things like nominal values as well, but they don't really make any sense
I mean working class ranges from question mark to without pay, you know is meaningless and then so for example capital gain
ranges from zero to nearly one hundred
Thousand and capital loss from zero
To four thousand and finally the hours per week main gist from 1 to 99

English: 
So you can see that the capital gain is many orders of magnitude larger in scale than the hours per week
We're going to need to try and scale this data. Well begin by doing to make our lives a little bit easier
It's just focus on the numerical attributes, right so we'd have to worry about the nominal values, which we've not codified yet
We're going to select all the columns from the data where they are numeric. So that's this line here a star then down here
So we're going to s apply that applies over each of the fields
is it numeric and that's going to give us a
Logical list that says true or false depending on whether those columns are numeric
What we're doing here is selecting from this list any bit of true and then finding their name
So what are the names of a columns for the numeric?
So let's have a look at just a range of these attributes to make a life a little bit easier
So I'm gonna run this line
and so this is a simplified version of what I was just showing you can see that capital gain is
massive compared to the hours per week
for example
Let's have a look at the standard deviation
the call that the standard deviation
Is the average distance from the mean so it kinda gives us an idea of the spread of some data
Like is it very tight and everyone owns roughly the same or is it very spread out and it's huge

English: 
So you can see that the capital gain is many orders of magnitude larger in scale than the hours per week
We're going to need to try and scale this data. Well begin by doing to make our lives a little bit easier
It's just focus on the numerical attributes, right so we'd have to worry about the nominal values, which we've not codified yet
We're going to select all the columns from the data where they are numeric. So that's this line here a star then down here
So we're going to s apply that applies over each of the fields
is it numeric and that's going to give us a
Logical list that says true or false depending on whether those columns are numeric
What we're doing here is selecting from this list any bit of true and then finding their name
So what are the names of a columns for the numeric?
So let's have a look at just a range of these attributes to make a life a little bit easier
So I'm gonna run this line
and so this is a simplified version of what I was just showing you can see that capital gain is
massive compared to the hours per week
for example
Let's have a look at the standard deviation
the call that the standard deviation
Is the average distance from the mean so it kinda gives us an idea of the spread of some data
Like is it very tight and everyone owns roughly the same or is it very spread out and it's huge

Japanese: 
つまり、キャピタルゲインは1週間の時間よりも桁違いに大きいことがわかります。
私たちはこのデータを試してスケールする必要があるでしょう。私たちの生活を少し楽にするためにやることから始めましょう
数値属性に焦点を当てているだけなので、公称値については心配する必要があります。
数値からなるデータからすべての列を選択します。それで、これがここのスター、それからここのライン
だから私たちはそれぞれの分野に当てはまることを適用するつもりです
それは数値であり、それは私たちに与えるつもりです
それらの列が数値かどうかに応じてtrueまたはfalseを表す論理リスト
私たちがここでしていることは、このリストから少しでも真実を選び、それから彼らの名前を見つけることです。
それでは、数値の列の名前は何ですか？
それでは、人生を少し楽にするために、これらの属性の範囲だけを見てみましょう。
だから私はこの行を実行するつもりです
そして、これは私が示したものを単純化したもので、キャピタルゲインは
週あたりの時間と比較して巨大
例えば
標準偏差を見てみましょう
標準偏差という呼び出し
平均値からの平均距離なので、データの広がりのアイデアがわかります
それは非常にタイトで、みんながほぼ同じものを所有しているか、それが非常に広がっていて巨大です

Turkish: 
Öyleyse, sermaye kazancının, haftalık saatlerden daha büyük ölçekte bir çok büyüklük sırası olduğunu görebilirsiniz.
Bu verileri denememiz ve ölçeklememiz gerekecek. Hayatımızı biraz daha kolaylaştırmak için yaparak işe başla
Bu sadece sayısal özelliklere odaklanır, doğru, çünkü henüz kodlamadığımız nominal değerler için endişelenmeliyiz.
Sayısal olduğu verilerden tüm sütunları seçeceğiz. Demek buradaki bu yıldız bir yıldız sonra aşağı
Bu yüzden, her alan için geçerli olanları uygulayacağız.
sayısal mı ve bu bize verecek
Bu sütunların sayısal olmasına bağlı olarak doğru ya da yanlış yazan mantıksal liste
Burada yaptığımız şey, bu listeden doğru olanı seçip isimlerini bulmak.
Öyleyse, sayısal için sütunların isimleri nelerdir?
Öyleyse bir hayatı biraz daha kolaylaştırmak için bu özelliklerin bir kısmına bakalım:
Yani bu çizgiyi çalıştıracağım
ve bu sadece size gösterdiğimin basitleştirilmiş bir versiyonudur, sermaye kazancının olduğunu görebilirsiniz
haftada saatlere göre çok büyük
Örneğin
Standart sapma bir göz atalım
standart sapma denilen çağrı
Ortalamaya olan ortalama uzaklık, bize biraz veri yayılması hakkında fikir veriyor.
Gibi çok sıkı ve herkes kabaca aynı sahip ya da çok yayılmış ve çok büyük

English: 
Deviations and the answer is there's pretty huge deviations. So the age has a standard deviation of 13
so it obviously
That means that most people are going to be kind of in the middle and on average
they're going to be 13 years younger or older, but you can see that things like capital gain have a
7,000 standard deviation, which is a huge amount to give you some idea what we're aiming for
It's very common to standardize this kind of data. So but the standard deviation is 1 right so
7,000 much too big let's plot an example but gives you some idea of what the kind of problem is when we have these massive
Ranges, so I'm going to plot here a graph of age vs. Capital games, right?
We know age goes between about one and a hundred and capital gain is much much larger
So if I run this basically the figure makes no sense at all because the capital gain ranges from zero to one hundred
Thousand and as a few people earning right at the top scale, everything is sort of squished down the bottom. We can't see anything
That's going on. There's no way of telling whether the capital gain of an individual is related to their age
I mean it probably is like because a retired people people who are very young. Perhaps her and slightly less

English: 
Deviations and the answer is there's pretty huge deviations. So the age has a standard deviation of 13
so it obviously
That means that most people are going to be kind of in the middle and on average
they're going to be 13 years younger or older, but you can see that things like capital gain have a
7,000 standard deviation, which is a huge amount to give you some idea what we're aiming for
It's very common to standardize this kind of data. So but the standard deviation is 1 right so
7,000 much too big let's plot an example but gives you some idea of what the kind of problem is when we have these massive
Ranges, so I'm going to plot here a graph of age vs. Capital games, right?
We know age goes between about one and a hundred and capital gain is much much larger
So if I run this basically the figure makes no sense at all because the capital gain ranges from zero to one hundred
Thousand and as a few people earning right at the top scale, everything is sort of squished down the bottom. We can't see anything
That's going on. There's no way of telling whether the capital gain of an individual is related to their age
I mean it probably is like because a retired people people who are very young. Perhaps her and slightly less

Turkish: 
Sapmalar ve cevabı oldukça büyük sapmalar var. Yani yaş 13 standart sapma var
yani belli ki
Bu, çoğu insanın orta ve ortalama olarak bir tür olacağı anlamına gelir
13 yaş ve üstü olacaklar, ancak sermaye kazancı gibi şeylerin bir
7.000 standart sapma, bu size neyi hedeflediğimiz hakkında fikir vermek için çok büyük bir miktar.
Bu tür verileri standartlaştırmak çok yaygındır. Yani standart sapma 1 doğru
7,000 çok büyük bir örnek çizelim, ancak size bu büyük kitleye sahip olduğumuzda ne tür bir sorun olduğu hakkında bir fikir verir.
Aralıklar, bu yüzden buraya Yaş ve Sermaye oyunlarının grafiğini çizeceğim, değil mi?
Yaşın yaklaşık yüz arasında olduğunu ve sermaye kazancının çok daha büyük olduğunu biliyoruz.
Eğer temelde bunu yönetirsem, hiç bir şey ifade etmiyor çünkü sermaye kazancı sıfırdan yüzeye kadar değişiyor.
Binlerce ve birkaç kişi en üst düzeyde kazanıyor, her şey dibe vuruyor. Hiçbir şey göremiyoruz
Bu devam ediyor. Bir bireyin sermaye kazancının yaşına bağlı olup olmadığını söylemenin bir yolu yoktur.
Muhtemelen öyle çünkü emekli insanlar çok genç insanlar. Belki o ve biraz daha az

Japanese: 
偏差とその答えは、かなり大きな偏差があるということです。そのため、年齢の標準偏差は13です。
だからそれは明らかに
これは、ほとんどの人が中央で平均的に親切になるだろうことを意味します
彼らは13歳以下になるでしょうが、キャピタルゲインのようなものには
7,000標準偏差。これは、私たちが目指していることを理解するための膨大な量です。
この種のデータを標準化することは非常に一般的です。ですが、標準偏差は1です。
大きすぎる7,000の例を見てみましょうが、これらが巨大な場合にどのような問題があるのか​​を理解することができます。
範囲なので、ここでは年齢とキャピタルゲームのグラフをプロットします。
私達は年齢がおよそ百から百の間に入り、キャピタルゲインがはるかに大きいことを知っています
それで私がこれを実行するならば、キャピタルゲインがゼロから百まで及ぶので、基本的に図は全く意味がありません
何千人もの人々がトップスケールで正しく稼いでいるように、すべてが下の方に押し寄せられています。何も見えない
それは起こっています。個人のキャピタルゲインが年齢に関連しているかどうかを判断する方法はありません。
私はそれはおそらく引退した人々が非常に若い人たちのためのようなものであることを意味します。おそらく彼女と少し少ない

Japanese: 
圧縮されすぎているので、ここではそれがわかりません。
より良い分析を実行できるようにするために、これらのことをまとめようとする必要があります。
これからやろうとしているのは、数値属性だけで新しいデータフレームを作成することです。
だから私たちの生活を少し楽にするためだけに集中したいのですが、それから正規化された関数を書くことにします。
すべてのデータを0から1の間に移動すると、属性ごとにこれを行います。
だから例えば
最小値と最大値の間のデータがある場合
そしてこのデータを0から1の間にスケールしたい
私たちがする必要があるのは、最初にすべての最小値を奪うことであり、それはすべてが0からなるように移動するつもりです
最大マイナス最小にするには、この距離で割ります。
だからこれは最大マイナス最小です。これで割れば、すべてが0から1になります。
それで、まさにここでこの関数で行っていることです。
関数Xを得て、それはXの最小値を引き、それから最大値と最小値の間の差で割ります。
右。それでこれは非常に標準的です。だから私はこれを実行するつもりです。私はあなたにこのような関数を書いてもらい、そしてそれらをで使うつもりです。

English: 
We can't really see that here because it's just too compressed, right?
We need to start trying to bring these things together so that we can perform better analysis
What we're going to do is create a new data frame with just the numerical attribute
so we want to focus on just to make our life a little bit easier and then we're going to write a normalized function to
Move all our data to between 0 and 1 and we will do this per attribute
so for example
If you've got some data which goes between a minimum and a maximum
And we want to scale this data to between 0 and 1
All we need to do is first of all take away the minimum and that's going to move everything to be from 0
To max minus min and then we're going to divide by this distance here
So this is max minus min. And if we divide by this everything is going to go from 0 to 1
So that's exactly what we're doing in this function here
we've got a function X and it subtracts the minimum of X and then divides by the difference between the maximum and the minimum all
Right. So this is very standard. So I'm going to run this. I'll let you write functions like this and then use them in

Turkish: 
Bunu gerçekten göremiyoruz çünkü çok fazla sıkıştırılmış, değil mi?
Daha iyi analizler yapabilmemiz için bunları bir araya getirmeye çalışmamız gerekiyor.
Yapacağımız şey sadece sayısal özelliklere sahip yeni bir veri çerçevesi oluşturmak.
bu yüzden hayatımızı biraz daha kolaylaştırmak için odaklanmak istiyoruz ve sonra normalleştirilmiş bir işlev yazacağız.
Tüm verilerimizi 0 ile 1 arasında bir değere taşıyın ve bunu özellik başına yapacağız
örneğin
Minimum ve maksimum değer arasında bir veri varsa
Ve bu verileri 0 ile 1 arasında ölçeklendirmek istiyoruz.
Tek yapmamız gereken, her şeyden önce minimum olanı almak ve bu her şeyi 0'dan başlayacak
Eksi eksi maks ve sonra bu mesafeye böleceğiz
Yani bu maksimum eksi min. Ve bununla bölersek her şey 0'dan 1'e gidecek
Demek burada bu fonksiyonda tam olarak ne yapıyoruz?
bir fonksiyonumuz var ve X'in minimumunu çıkarır ve daha sonra maksimum ve minimum arasındaki farklara bölünür.
Sağ. Yani bu çok standart. Bu yüzden bunu yöneteceğim. Böyle fonksiyonları yazmanıza izin verin, sonra bunları

English: 
We can't really see that here because it's just too compressed, right?
We need to start trying to bring these things together so that we can perform better analysis
What we're going to do is create a new data frame with just the numerical attribute
so we want to focus on just to make our life a little bit easier and then we're going to write a normalized function to
Move all our data to between 0 and 1 and we will do this per attribute
so for example
If you've got some data which goes between a minimum and a maximum
And we want to scale this data to between 0 and 1
All we need to do is first of all take away the minimum and that's going to move everything to be from 0
To max minus min and then we're going to divide by this distance here
So this is max minus min. And if we divide by this everything is going to go from 0 to 1
So that's exactly what we're doing in this function here
we've got a function X and it subtracts the minimum of X and then divides by the difference between the maximum and the minimum all
Right. So this is very standard. So I'm going to run this. I'll let you write functions like this and then use them in

English: 
Applications over data, so we're going to calculate a normalized Census data set which is we're going to apply
over dimension to this normalized function
We just wrote and then now if we look at the range will see that our range is now
Between 0 and 1 for all of our data, which is exactly what we want
The normalization is a perfectly good way of handling your data
If everything is between 0 and 1 we have fewer problems with the scale of things being way off now
Some statistical techniques like PCA that we're going to talk video
They require standardized data that's data, but it's centered around zero
It has a mean of zero and a standard deviation of one now. We can standardize data pretty easily in the same way
Actually, we don't need to write our own function for this the scale function in our performs this force
So we're going to take the census data over numerical attributes and we're going to call the scale function and that's going to take all
Of the attributes and center them around their mean so that means the mean will become close to zero and it's going to divide them
All by the standard deviation so their standard deviation becomes one
So if we run that and then we have a look at the mean of this data

Turkish: 
Veri üzerindeki uygulamalar, bu yüzden uygulayacağımız normalleştirilmiş bir Sayım veri setini hesaplayacağız.
bu normalleştirilmiş işleve aşırı boyut
Daha yeni yazdık ve şimdi menzile bakarsak menzilimizin şimdi olduğunu göreceğiz.
Tüm verilerimiz için 0 ile 1 arasında, ki istediğimiz tam olarak bu
Normalleştirme, verilerinizi kullanmanın mükemmel bir yoludur.
Eğer her şey 0 ile 1 arasındaysa, şimdi durumların ölçeği ile ilgili daha az sorunumuz var.
VideoA konuşacağımız PCA gibi bazı istatistiksel teknikler
Veri olan standart veriye ihtiyaç duyarlar, ancak sıfır merkezlidir
Artık sıfır ortalaması ve birinin standart sapması var. Verileri aynı şekilde kolayca standartlaştırabiliriz
Aslında bunun için kendi fonksiyonumuzu yazmamıza gerek yok, bu fonksiyonu gerçekleştirdiğimiz ölçek fonksiyonu
Yani sayım verilerini sayısal özellikler üzerinden alacağız ve ölçek işlevini çağıracağız.
Nitelikler arasında ve ortalamasının üzerinde ortalayın, böylece orta sıfıra yaklaşacak ve onları bölecek
Hepsi standart sapma ile standart sapma bir olur
Yani, eğer onu çalıştırırsak ve o zaman bu verinin ortalamasına bir göz atarız

Japanese: 
データ上のアプリケーションなので、正規化された国勢調査データセットを計算します。これを適用します。
この正規化関数の次元以上
私達はちょうど書いたそしてそれから私達が範囲を見れば私達の範囲が今あることを見るだろう
すべてのデータについて0から1の間、これはまさに私たちが望むものです。
正規化はあなたのデータを扱うための完璧な方法です。
すべてが0から1の間であれば、物事の規模が今や遥かに離れていても問題は少なくなります。
私達がビデオを話すつもりであるPCAのようないくつかの統計的手法
彼らはデータである標準化されたデータを必要としますが、それはゼロを中心としています
それは現在ゼロの平均と1の標準偏差を持っています。同じ方法でデータを非常に簡単に標準化できます
実際には、この力を実行するためのスケール関数をこのために独自の関数を書く必要はありません。
それで、私たちは数値属性の上に国勢調査データを取るつもりです、そして我々はスケール関数を呼ぶつもりですそしてそれはすべてを取るつもりです
属性のうち、平均値を中心にそれらを中心に配置するので、平均値はゼロに近くなり、分割されます。
すべて標準偏差によるので、それらの標準偏差は1になります。
したがって、それを実行してから、このデータの平均値を調べます。

English: 
Applications over data, so we're going to calculate a normalized Census data set which is we're going to apply
over dimension to this normalized function
We just wrote and then now if we look at the range will see that our range is now
Between 0 and 1 for all of our data, which is exactly what we want
The normalization is a perfectly good way of handling your data
If everything is between 0 and 1 we have fewer problems with the scale of things being way off now
Some statistical techniques like PCA that we're going to talk video
They require standardized data that's data, but it's centered around zero
It has a mean of zero and a standard deviation of one now. We can standardize data pretty easily in the same way
Actually, we don't need to write our own function for this the scale function in our performs this force
So we're going to take the census data over numerical attributes and we're going to call the scale function and that's going to take all
Of the attributes and center them around their mean so that means the mean will become close to zero and it's going to divide them
All by the standard deviation so their standard deviation becomes one
So if we run that and then we have a look at the mean of this data

English: 
So for example here, we calculate the mean you can see that I mean these values are very very close to one
That's 10 to the minus 17 or something like that very very small and if we look at the standard deviation
Similarly, they're all going to be 1. All right, so this is now standardized data
This is a very good thing to do
If you want to use your data in some kind of machine learning algorithm or some kind of clustering
Let's imagine now that we want to join some data sets together
So we standardize data everything's between 0 and 1 or it's centered around 0 with a standard deviation of 1 we've codified some attributes
What happens if we get other data from other sources, you can imagine that census data from the US might be a bit useful
But maybe we want census data from Spain or from the UK or from another country
Can we join organs together to get a bigger more useful data set?
Now the thing to think about when you're doing this is just to make sure that everything makes sense
Right are the scales the same are they all normalized or none of them normalized?
Because otherwise, what you're going to be doing is you're going to be adding, you know

Turkish: 
Mesela burada, gördüğünüz ortalamaları hesaplıyoruz, bu değerlerin bire çok yakın olduğu anlamına geliyor.
Bu eksi 17'ye 10 ya da çok küçük bir şey ve standart sapmaya bakarsak
Benzer şekilde, hepsi 1 olacak. Pekala, bu şimdi standart hale getirilmiş veri
Bu yapılacak çok iyi bir şey.
Verilerinizi bir tür makine öğrenme algoritması veya bir tür kümeleme içinde kullanmak istiyorsanız
Şimdi bazı veri setlerine katılmak istediğimizi düşünelim
Bu yüzden her şeyi 0 ile 1 arasında olan verileri standartlaştırıyoruz ya da 1 civarında bir sapma ile 0 civarında ortalıyoruz, bazı özellikleri kodladık
Başka kaynaklardan başka veriler alırsak ne olur, ABD'deki nüfus sayımı verilerinin biraz yararlı olabileceğini hayal edebilirsiniz
Ama belki İspanya'dan veya İngiltere'den veya başka bir ülkeden nüfus sayımı verisini istiyoruz.
Daha büyük ve daha kullanışlı bir veri seti elde etmek için organları birleştirebilir miyiz?
Şimdi bunu yaparken düşünmeniz gereken şey, her şeyin mantıklı geldiğinden emin olmak.
Ölçekler aynı mı, hepsi normalize mi, hiçbiri normalize edilmedi mi?
Çünkü aksi takdirde, yapacağınız şey, ekleyeceğinizdir.

Japanese: 
したがって、ここでは、たとえば、平均値を計算します。これらの値は1に非常に近いことを意味します。
それは10からマイナス17、あるいはそれは非常に小さいもので、標準偏差を見ると
同様に、それらはすべて1になります。大丈夫なので、これは標準化されたデータです。
これはとても良いことです。
ある種の機械学習アルゴリズムやある種のクラスタリングでデータを使用したい場合
いくつかのデータセットを結合したいとしましょう。
そのため、0から1までのすべてのデータを標準化するか、0を中心に1の標準偏差でデータを標準化しました。属性を体系化しました。
他の情報源から他のデータを取得した場合はどうなりますか、米国の国勢調査データが少し役に立つかもしれないと想像することができます。
しかし、スペイン、イギリス、あるいは他の国からの国勢調査データが欲しいのかもしれません。
より有用なデータセットを得るために臓器を結合することはできますか？
今これをしているときについて考えることはただすべてが理にかなっていることを確かめることです
正しいスケールは同じですか？それらはすべて正規化されているか、またはどれも正規化されていませんか？
そうでなければ、あなたがやろうとしていることはあなたが追加しようとしているということです、あなたは知っている

English: 
So for example here, we calculate the mean you can see that I mean these values are very very close to one
That's 10 to the minus 17 or something like that very very small and if we look at the standard deviation
Similarly, they're all going to be 1. All right, so this is now standardized data
This is a very good thing to do
If you want to use your data in some kind of machine learning algorithm or some kind of clustering
Let's imagine now that we want to join some data sets together
So we standardize data everything's between 0 and 1 or it's centered around 0 with a standard deviation of 1 we've codified some attributes
What happens if we get other data from other sources, you can imagine that census data from the US might be a bit useful
But maybe we want census data from Spain or from the UK or from another country
Can we join organs together to get a bigger more useful data set?
Now the thing to think about when you're doing this is just to make sure that everything makes sense
Right are the scales the same are they all normalized or none of them normalized?
Because otherwise, what you're going to be doing is you're going to be adding, you know

Japanese: 
何もないものと何もないものの間のどこかに何もないと十万の間で支払うことはもはや意味をなさない
あなたはあなたのデータを破壊するつもりです。それでは、国勢調査データセットでこれを見てみましょう。
米国の国勢調査データと非常によく似た形式のスペインの国勢調査データがいくつかあります。
スペインのデータのCSVファイルを読むつもりですので、簡単に見てみましょう
米国の国勢調査データに含まれている列を思い出してみましょう。これらは数値列です
だから私たちは年齢があります
教育番号キャピタルゲインキャピタルロスこのようなこと
スペイン語のデータセットを見て、2つを結合できるかどうかを確認しましょう。
だから私は最初の数行を与えるつもりだスペインを走らせるつもりだそしてあなた
以前と同じように、いくつかのものがそこにあることがわかります。
それらが民間部門でも公的部門でも機能するかどうかに関わらず、数値を作成するためにこれらを削除する必要があるのはなぜでしょうか。
属性と他の問題はあなたが注意深く見れば
スペインのデータセットのキャピタルゲインは、現在はドルではなくユーロであることがわかります。これは大きな問題です。
彼らは明らかに全国的に違いはない
それらは同じオーダーの大きさです

English: 
Pay between naught and a hundred thousand to somewhere between no and one nothing makes any sense anymore
You're gonna wreck your data. So let's have a look at this on the census data set
We have some Spanish census data in a very similar format to our census data from the United States
Let's have a quick look so I'm going to read the CSV file of Spain data
Let's remind ourselves of the columns that we had in our census data from the United States. These are the numerical columns
So we have age
Education number capital gain capital loss this kind of thing
Let's look at the Spanish data set to see if we can just join the two together
so I'm gonna run head Spain that's going to give us the first few rows and you
Can see that there's some of the stuff in there is as it was before so things like what their level of education is
Whether they work in the private sector or the public sector why we're going to need to remove these things to create just a numerical
attributes and the other problem is if you look carefully
You'll see that the capital gain in the Spanish dataset is in euros not in dollars now, that's a huge problem
They don't they're not nationally different obviously
They're on the same order of magnitude

English: 
Pay between naught and a hundred thousand to somewhere between no and one nothing makes any sense anymore
You're gonna wreck your data. So let's have a look at this on the census data set
We have some Spanish census data in a very similar format to our census data from the United States
Let's have a quick look so I'm going to read the CSV file of Spain data
Let's remind ourselves of the columns that we had in our census data from the United States. These are the numerical columns
So we have age
Education number capital gain capital loss this kind of thing
Let's look at the Spanish data set to see if we can just join the two together
so I'm gonna run head Spain that's going to give us the first few rows and you
Can see that there's some of the stuff in there is as it was before so things like what their level of education is
Whether they work in the private sector or the public sector why we're going to need to remove these things to create just a numerical
attributes and the other problem is if you look carefully
You'll see that the capital gain in the Spanish dataset is in euros not in dollars now, that's a huge problem
They don't they're not nationally different obviously
They're on the same order of magnitude

Turkish: 
Naught ile yüz bin arasında, hiçbiri ile bir arasında hiçbir yere artık bir anlam ifade etmiyor.
Verilerini mahvedeceksin. Şimdi nüfus sayımı veri setine bir göz atalım.
Amerika Birleşik Devletleri'nden yapılan nüfus sayımı verilerimize çok benzer bir biçimde bazı İspanyol nüfus sayımı verilerimiz var.
Hızlıca bir göz atalım, bu yüzden İspanya verilerinin CSV dosyasını okuyacağım.
Nüfus sayımı verilerimizde Amerika Birleşik Devletleri'nden edindiğimiz sütunları kendimize hatırlatalım. Bunlar sayısal sütunlardır
Yani yaşlandık
Eğitim sayı sermaye kazancı sermaye kaybı bu tür şeyler
İkisine birlikte katılıp katılmayacağımızı görmek için İspanyol veri setine bakalım.
bu yüzden bize ilk birkaç sırayı verecek olan İspanya'nın başına geçeceğim.
Orada daha önce olduğu gibi bazı şeylerin olduğunu görebiliyorsunuz, böylece eğitim seviyelerinin ne olduğu gibi
Özel sektörde mi yoksa kamu sektöründe mi çalıştıkları, neden sadece bir rakam oluşturmak için bu şeyleri kaldırmamız gerekecek?
özellikleri ve diğer sorun dikkatlice bakarsanız
İspanya veri setindeki sermaye kazancının şu anda dolar cinsinden değil avro cinsinden olduğunu göreceksiniz, bu büyük bir sorun
Açıkça ulusal olarak farklı değiller.
Aynı büyüklüktedirler.

Japanese: 
しかし、これら2つのスケールは同じではないので、ドルの次にユーロでキャピタルゲインを妨げたくはありません。
したがって、最初にする必要があるのは、何らかのデータを使用してこのデータをスケーリングすることです。
だからここで私たちがやろうとしているのはスペインで新しいコラムを作成するつもりです。
スペインのデータフレームが与えられたとすると、スペインのキャピタルゲインは次のようになります。
ユーロキャピタルゲインは為替レートである1.1 3倍になります。私たちは今使うつもりです
このような状況では非常に重要です。オンラインで為替レートを調べるだけではありません
あなたは考慮しなければなりませんが、これはしばらく前に集められたかもしれません
このデータが収集されたときの為替レートはいくらでしたか。
それではその行を実行して、資本損失についても同じことをしましょう。
国勢調査データとスペイン語データの数値属性のみを保持します。
また、列をもう1つ追加します。それは彼らがどの国から来たのか
さもないと
私たちは知るつもりはないので、数値属性として国勢調査データを組み合わせるためにcolumbine関数を使います。
この場合はアメリカ合衆国になる自国

English: 
But we don't want to be jamming capital gain in euros next to dollars because those two scales are not the same, right?
So what we need to do first is scale this data using some kind of exchange rate
So here what we're going to do is we're going to create a new column in Spain
so given a spain data frame we're going to say the spain capital gain is equal to the
Euro capital gain times by 1.1 3, which is the exchange rate. We're going to use now
It's quite important in this kind of situation. Not just to look up the exchange rate online
You've got to consider but this might have been collected a while ago
What was the exchange rate when this data was collected why these are things you're going to have to think about?
So let's run that line and let's do the same thing for the capital loss now
We're going to keep just for numerical attributes of our census data and of the spanish data
And we're also going to add another column. That is what country they come from
otherwise
we're not going to know so we're going to use the columbine function to combine the census data as numerical attributes and
The native country which in this case will be the United States

English: 
But we don't want to be jamming capital gain in euros next to dollars because those two scales are not the same, right?
So what we need to do first is scale this data using some kind of exchange rate
So here what we're going to do is we're going to create a new column in Spain
so given a spain data frame we're going to say the spain capital gain is equal to the
Euro capital gain times by 1.1 3, which is the exchange rate. We're going to use now
It's quite important in this kind of situation. Not just to look up the exchange rate online
You've got to consider but this might have been collected a while ago
What was the exchange rate when this data was collected why these are things you're going to have to think about?
So let's run that line and let's do the same thing for the capital loss now
We're going to keep just for numerical attributes of our census data and of the spanish data
And we're also going to add another column. That is what country they come from
otherwise
we're not going to know so we're going to use the columbine function to combine the census data as numerical attributes and
The native country which in this case will be the United States

Turkish: 
Ancak, bu iki ölçek aynı değil, çünkü sermaye kazancını doların yanında avro olarak kapatmak istemiyoruz, değil mi?
İlk önce yapmamız gereken, bu verileri bir tür döviz kuru kullanarak ölçeklendirmek.
İşte burada yapacağımız şey, İspanya'da yeni bir sütun yaratacağımız.
bu yüzden bir İspanya veri çerçevesi göz önüne alındığında, ispanya sermaye kazancına eşit olduğunu söyleyeceğiz.
Euro sermaye kazancı 1,1 3 katıdır ki bu da döviz kurudur. Şimdi kullanacağız
Bu tür durumlarda oldukça önemlidir. Sadece çevrimiçi döviz kurunu aramak için değil
Düşünmelisin ama bu bir süre önce toplanmış olabilir.
Bu veriler toplandığında, neye düşünmeniz gereken şeyler olduğu için döviz kuru neydi?
Öyleyse bu çizgiyi çalıştıralım ve şimdi sermaye kaybı için aynı şeyi yapalım
Sadece nüfus sayımı verilerimizin ve ispanyolca verilerinin sayısal niteliklerini koruyacağız
Ayrıca başka bir sütun ekleyeceğiz. Onların geldiği ülke bu
aksi takdirde
Nüfus sayımı verilerini sayısal özellikler olarak birleştirmek için columbine işlevini kullanacağız.
Bu durumda ABD olacak olan ülke

Turkish: 
İspanya verileri için de aynı şeyi yapacağız, ki bu temelde kesinlikle aynı olacaktır.
biz de yerel ülke olarak ispanyaya sahip olacağız ve
Sonra, bu iki tabloyu bir araya getirmek için, karaca bağlama özelliğini kullanacağız
Şimdi bu sadece bu iki veri kümesinin aynı özelliklere sahip olması durumunda işe yarar.
Yeni duyu bulunamadı
Neyi yanlış yaptım? Yani bir yazım hatası mı vardı? Öyleyse bu ikisini birbirine bağlayalım.
Oraya gidiyoruz. Ve böylece Birleşik veri setimiz artık Birleşik Devletler ve İspanya için birleştirilmiş gözlemlere sahiptir.
yapmak istemediğiniz şey sadece onlara katılmak ve sadece biraz olsun isteyip istemediğinize bırakın
Verilerin dağılımını sağlamak için bazı grafiklere bakın. Sadece bir araya geldin, mantıklı geldin. Örneğin
Sağ. Bu nedenle, Amerika Birleşik Devletleri verileri farklı yaşlardan oluşan geniş bir dağılıma sahiptir.
İspanya verilerinin aynı dağılıma sahip olduğundan emin olmak istiyoruz.
Aksi takdirde, veri kümenizi güvence altına alacaksınız.
örneğin
Kabaca sermaye kazancı seviyelerinin olup olmadığına bakalım.

Japanese: 
私たちはスペインのデータについても全く同じことをするつもりです。
私達はまた母国としてスペインを持つことになるでしょう
それでは、これら2つのテーブルを結合するために、roe bind機能を使用します。
これら2つのデータセットがまったく同じ属性を持っている場合にのみ機能します。
新しい感覚が見つからない
私は何をしましたか？だから私はタイプミスがありましたか？それでは、バインドを使用してこれら2つを結合しましょう。
そこに行きます。そして、私たちの米国のデータセットは、現在、米国とスペインの合計観測値を持っています。
あなたがしたくないのは、ただ一緒に参加してそのままにしておくことです。
いくつかのプロットを見て、データの分布が確実になるようにします。あなたはちょうど一緒に参加しました意味があります。例えば
右。このように、米国のデータはさまざまな年齢の素晴らしい広い分布を持っています
スペインのデータも同じ分布になるようにしたい
そうでなければ、あなたはあなたのデータセットを保護しようとしているのです。
だから例えば
キャピタルゲインのレベルが次のとおりであるかどうかをおおよそ見てみましょう。

English: 
We're going to do the exact same thing for the spain data, which will be basically exactly the same except obviously
we're also going to have spain as the native country and
Then we're going to use the roe bind feature to just join those two tables together
Now that will only work if those two datasets have the exact same attributes
New sense is not found
What did I do wrong? So I had a typo? So let's join these two together using our bind
There we go. And so our United dataset now has the combined observations for the United States and Spain now
what you wouldn't want to do is just join them together and just leave it at that why you want to perhaps have a little
Look at some plots to make sure that the distributions of the data. You've just joined together make sense. For example
Right. Thus the United States data has a nice broad distribution of different ages
We want to make sure that the Spanish data has that same distribution
Otherwise, you're kind of going to secure your data set
so for example
Let's have a look at roughly whether the levels of capital gain are

English: 
We're going to do the exact same thing for the spain data, which will be basically exactly the same except obviously
we're also going to have spain as the native country and
Then we're going to use the roe bind feature to just join those two tables together
Now that will only work if those two datasets have the exact same attributes
New sense is not found
What did I do wrong? So I had a typo? So let's join these two together using our bind
There we go. And so our United dataset now has the combined observations for the United States and Spain now
what you wouldn't want to do is just join them together and just leave it at that why you want to perhaps have a little
Look at some plots to make sure that the distributions of the data. You've just joined together make sense. For example
Right. Thus the United States data has a nice broad distribution of different ages
We want to make sure that the Spanish data has that same distribution
Otherwise, you're kind of going to secure your data set
so for example
Let's have a look at roughly whether the levels of capital gain are

English: 
Approximately the same for both the United States and the Spanish data set so I'm gonna use ggplot for this
We're gonna plot a bar chart where we've color-coded United States and Spain and you can see that broadly speaking
There's a lot in the kind of around zero or less than 50k and then there's a few a little bit above
All right, so that looks broadly speaking the same distribution. I'm fairly happy with that
This is gonna be a judgement call when you get your own data
So I'll clear the screen and then let's have a look at the next plot
So the next plot is going to be capital loss versus the native country. Let's make sure those distributions are the same
So it's posting there and broadly speaking again
Yes
the majority are down the bottom and then there's a few United States ones and a couple of Spanish ones up at the top as
Well again, it's not a disaster. That's probably ok. Finally, let's have a look at ages by native country
So if we plot this we can see two very very similar distributions
You can see that it's essentially a bell curve. Maybe slightly skewed towards older participants
For the United States and very very similar for Spain. This is okay. If we
Hypothesized that capital gain capital loss and salary was something to do with your age

English: 
Approximately the same for both the United States and the Spanish data set so I'm gonna use ggplot for this
We're gonna plot a bar chart where we've color-coded United States and Spain and you can see that broadly speaking
There's a lot in the kind of around zero or less than 50k and then there's a few a little bit above
All right, so that looks broadly speaking the same distribution. I'm fairly happy with that
This is gonna be a judgement call when you get your own data
So I'll clear the screen and then let's have a look at the next plot
So the next plot is going to be capital loss versus the native country. Let's make sure those distributions are the same
So it's posting there and broadly speaking again
Yes
the majority are down the bottom and then there's a few United States ones and a couple of Spanish ones up at the top as
Well again, it's not a disaster. That's probably ok. Finally, let's have a look at ages by native country
So if we plot this we can see two very very similar distributions
You can see that it's essentially a bell curve. Maybe slightly skewed towards older participants
For the United States and very very similar for Spain. This is okay. If we
Hypothesized that capital gain capital loss and salary was something to do with your age

Turkish: 
Amerika Birleşik Devletleri ve İspanyol veri seti için yaklaşık olarak aynıdır, bu yüzden ggplot kullanacağım
Amerika Birleşik Devletleri ve İspanya’nın renk kodlu olduğu bir çubuk grafik çizeceğiz.
50k'den daha az veya sıfır civarında bir şey var ve sonra biraz daha yukarıda
Pekala, bu geniş ölçüde aynı dağılımdan bahsetmiş gibi görünüyor. Bununla oldukça mutluyum
Kendi verilerinizi aldığınızda bu bir yargılama çağrısı olacak
Öyleyse ekranı temizleyeceğim ve sonra bir sonraki çizime bakalım
Dolayısıyla bir sonraki arsa, yerli ülkeye karşı sermaye kaybı olacak. Bu dağıtımların aynı olduğundan emin olalım
Bu yüzden orada gönderiyor ve geniş tekrar konuşuyor
Evet
çoğunluk aşağıdan aşağıya doğru gidiyor ve sonra birkaç tane Amerika Birleşik Devletleri var ve en üstte birkaç tane İspanyol var.
 
Yine, bu bir felaket değil. Muhtemelen tamamdır. Son olarak, yaşlarına göre ülke ülkesine bakalım.
Öyleyse eğer bunu çizersek çok çok benzer iki dağılım görebiliriz.
Bunun aslında bir çan eğrisi olduğunu görebilirsiniz. Belki biraz daha yaşlı katılımcılara doğru çarpık
Amerika Birleşik Devletleri için ve İspanya için çok benzer. Tamamdır. Eğer biz
Sermayenin sermaye kaybı ve maaşın yaşınızla ilgisi olduğunu varsaydı.

Japanese: 
米国とスペインの両方のデータセットでほぼ同じなので、これにはggplotを使用します。
米国とスペインを色分けした棒グラフを作成してみましょう。おおまかに言って
ゼロに近いか50k未満のようなものがたくさんあり、それから少し上にいくつかあります
大丈夫です、それでそれは大まかに言って同じディストリビューションに見えます。私はそれにかなり満足しています
あなたがあなた自身のデータを手に入れたとき、これは判断の電話になるでしょう
画面をクリアしてから、次のプロットを見てみましょう。
それで、次のプロットは、自国に対する資本損失です。それらの分布が同じであることを確認しましょう
それで、それはそこに投稿して、そして再び大まかに言っています
はい
大多数は下のほうにあり、それからいくつかの米国のものと上のように上の2つのスペインのものがあります。
 
繰り返しますが、それは災害ではありません。それはおそらく大丈夫です。最後に、母国別の年齢を見てみましょう
したがって、これをプロットすると、非常によく似た2つの分布を見ることができます。
あなたはそれが本質的にベルカーブであることがわかります。たぶん、年配の参加者に向かってわずかにゆがんだ
米国ではスペインと非常によく似ています。これは大丈夫です。もし私達
キャピタルゲインキャピタルロスと給料はあなたの年齢と関係があるという仮説

English: 
Then it would make sense to have two data sets that you're joining together have very similar distributions in this regard
So let's look at one more data set from Denmark. All right, so it's the same thing same format
We're gonna read the CSV and we're going to have a look at just the top few rows to make sure it's in the same
format, so that's using a head function and you can see actually we've already removed the
nominal and other text attributes from here and we've just got the
numerical ones and actually also
capital gain and capital loss are already in dollars in this data set so we don't have to perform a conversion so we can use
Our bind to put these two things together and now we just need to check the distributions are the same
so again
We're going to put the age
Against the native country and see if these towards the same
Distributions and you actually you can see this isn't looking too good the United States and the Spanish dataset have very similar
Distributions the participants or the people who have been polled from Denmark are much much older on average, right?
This could have an effect on things like capital gain, so I wouldn't necessarily feel comfortable
Just joining this data set in without you thinking about it a little bit more closely

Turkish: 
Sonra birlikte katılacağınız iki veri setinin olması bu konuda çok benzer dağılımlara sahip olur.
Öyleyse, Danimarka'dan gelen bir verilere daha bakalım. Tamam, aynı şey aynı formatta
CSV'yi okuyacağız ve aynı olduğundan emin olmak için sadece ilk birkaç sıraya göz atacağız
format, yani bir head işlevi kullanıyor ve gerçekten kaldırdığımızı görebilirsiniz
Buradan nominal ve diğer metin özniteliklerini aldık.
sayısal olanlar ve aslında
sermaye kazancı ve sermaye kaybı bu veri setinde zaten dolar cinsindendir, bu yüzden bir dönüşüm yapmak zorunda değiliz.
Bu iki şeyi bir araya getirmek için bizim bağımız ve şimdi dağılımların aynı olup olmadığını kontrol etmemiz gerekiyor
Ve yine
Çağı koyacağız
Yerli ülkeye karşı ve bakalım aynı mı
Dağılımlar ve aslında bunun çok iyi görünmediğini görüyorsunuz.
Dağılımlar Katılımcılar veya Danimarka'dan oylananlar ortalama olarak çok daha yaşlıdır, değil mi?
Bunun sermaye kazancı gibi şeyler üzerinde etkisi olabilir, bu yüzden mutlaka rahat hissetmem
Sadece biraz daha yakından düşünmeden bu veri setine katılmak

Japanese: 
それから、あなたが一緒に参加している2つのデータセットがこの点で非常に類似した分布を持っていることは意味があるでしょう
それではデンマークのもう1つのデータセットを見てみましょう。大丈夫、だから同じもの同じフォーマット
私達はCSVを読むつもりですそしてそれが同じであることを確かめるためにちょうど私達はちょうどトップの数行を見ることになるでしょう
これはhead関数を使っているので、実際には
ここから名目上および他のテキスト属性
数値的なもの
このデータセットでは、キャピタルゲインとキャピタルロスはすでにドル単位であるため、変換を実行する必要はありません。
これら二つのことをまとめるための私達の縛りそして今私達はただ分布が同じであることをチェックする必要があります
そうまた
我々は時代を迎えるつもりです
母国に対して、これらが同じかどうかを確認してください
分布と実際、アメリカとスペインのデータセットが非常に似通っていないことがわかります。
デンマークから投票された参加者や人々の分布は、平均してはるかに年上ですね。
これはキャピタルゲインのようなことに影響を与える可能性があるので、私は必ずしも快適に感じるとは限らないでしょう
もう少し詳しく考えなくても、このデータセットに参加できます。

English: 
Then it would make sense to have two data sets that you're joining together have very similar distributions in this regard
So let's look at one more data set from Denmark. All right, so it's the same thing same format
We're gonna read the CSV and we're going to have a look at just the top few rows to make sure it's in the same
format, so that's using a head function and you can see actually we've already removed the
nominal and other text attributes from here and we've just got the
numerical ones and actually also
capital gain and capital loss are already in dollars in this data set so we don't have to perform a conversion so we can use
Our bind to put these two things together and now we just need to check the distributions are the same
so again
We're going to put the age
Against the native country and see if these towards the same
Distributions and you actually you can see this isn't looking too good the United States and the Spanish dataset have very similar
Distributions the participants or the people who have been polled from Denmark are much much older on average, right?
This could have an effect on things like capital gain, so I wouldn't necessarily feel comfortable
Just joining this data set in without you thinking about it a little bit more closely

Turkish: 
Tamam, öyleyse, böyle bir veri kümesine ne zaman katılırsanız, farklı kaynaklardan veri alın.
dikkatli düşün
Adil ve yaptığınız şeyin makul olduğundan emin olmak için
veri kümelerinin birleştirilmesi
Ve aslında bunlar, Spotify danışman sistemini ve diğerlerini destekleyen özellikler. Yani akustik hemşire gibi şeyler var
Enstrümantal hemşiremiz olan sıfırdan akustik olana nasıl akustik geliyor?
Konuşma kelimesi şapka şapkasını hapsettiğine ikna olmadım, konuşma ne kadar konuşma mı, yoksa on gibi şeyler

English: 
Alright, so whenever you're joining data set like this taking data from different sources
think carefully
To make sure that it's fair and what you're doing is a reasonable
concatenation of datasets
And actually these are the features that power Spotify recommender system and numerous others. So we've got things like acoustic nurse
How acoustic does it sound from from a zero to a one we've got instrumental nurse?
I'm not convinced as a word speech enos the hat hat to what extent is it speech or not speech and then things like ten

English: 
Alright, so whenever you're joining data set like this taking data from different sources
think carefully
To make sure that it's fair and what you're doing is a reasonable
concatenation of datasets
And actually these are the features that power Spotify recommender system and numerous others. So we've got things like acoustic nurse
How acoustic does it sound from from a zero to a one we've got instrumental nurse?
I'm not convinced as a word speech enos the hat hat to what extent is it speech or not speech and then things like ten

Japanese: 
大丈夫、だからあなたがこのようにデータセットに参加しているときはいつでも異なるソースからデータを取得する
慎重に考える
それが公平であること、そしてあなたがしていることが合理的であることを確認すること
データセットの連結
そして実際にこれらはSpotifyの推薦システムと他の多数のものを動かす機能です。だから私たちはアコースティックナースのようなものを持っています
それは私達が器用な看護婦を手に入れたゼロから1までどれくらい音響的に聞こえますか？
私は、スピーチがスピーチかスピーチではないかというハットハットを意味する言葉であると確信していません。
