
Spanish: 
Lo Logre!
Te Impresione?
SI creo una inteligencia artifical para que juegue por mi
Te impresionará?
Hola Mundo ! Bienvenidos a Sirajology!
En este episodia haremos una Inteligencia artificial para que juegue juegos de Atari
Los juegos tienen mucha historia como medio de prueba de la inteligencia artificial
desde juegos como el PONG
Los programadores de Inteligencia Artificial (AI) utilizaron Reduccionismo para crear AI
Reducen el mundo simulado a un modelo
y que la IA actúe basado en conocimiento previo del modelo
Y funcionó... supongo
No enrealidad
Y si creamos una IA que pudiera jugar varios juegos
Todos los modelos son diferentes, no podríamos usar uno solo
En vez de modelar el mundo, modelemos la mente
Queremos una AI que se vuelva un profesional en todo lo que juegue
Cuando pensamos esto, cual es la manera más genial de hacerlo?
Deepmind ya lo hizo 2015
El fin de Deepmind es crear una "Inteligencia Artifical General" (AGI)
Un algoritmo que pueda solucionar cualquier problema como lo haría un humano o mejor
Lograron un hito porque crearon una AI que dominó 49 juegos de atari , sin especificar qué juego

English: 
Yes I beat it, did that impress you? If i built an ai to beat this for me would that impress you
Hello World, welcome to Sirajology! In this
episode we're going to build an AI to beat
a bunch of Atari games. Games have had a long
history of being a testbed for AI ever since
the days of Pong. Traditionally, game programmers
have taken a reductionist approach to building
AI. They've reduced the simulated world to
a model and had the AI act on prior knowledge
of that model. And it worked out for the most
part. I guess. Not really. But what if we
want to build an AI that can be used in several
different types of game worlds? All the world
models are different so we couldn't feed it
just one world model. Instead of modeling
the world, we need to model the mind.
We want to create an AI that can become a
pro at any game we throw at it. So in thinking
about this problem, we have to ask ourselves
-- what is the dopest way to do this? Well,
the London-based startup DeepMind already
did this in 2015. DeepMind's goal is to create
artificial general intelligence, thats one
algorithm that can solve any problem with
human level thinking or greater. They reached
an important milestone by creating an algorithm
that was able to master 49 different Atari
games with no game-specific hyperparameter
tuning whatsoever. Google snapped them up
like yooooooooo. The algorithm is called the

English: 
Deep Q Learner and it was recently made open
source on GitHub. It only takes two inputs
-- the raw pixels of the game and the game
score. That's it. Based on just that it has
to complete its objective; maximize the score.
Let's dive into how this works, since we'll
want to recreate their results.
First it uses a deep convolutional neural
network to interpret the pixels. This is a
type of neural network inspired by how our
visual cortex operates, and expects images
as inputs. Images are high dimensional data
so we need to reduce the number of connections
each neuron has to avoid overfitting. Overfitting
by the way is when your model is too complex,
there too many parameters and so its overly
tuned to the data you've given it and won't
generalize well for any new dataset. So unlike
a regular neural network, a convolutional
network's layers are stacked in 3 dimensions
and this makes it easy to connect each neuron
ONLY to neurons in its local region instead
of every single other neuron. Each layer acts
as a detection filter for the presence of
specific features in an image and the layers
get increasingly abstract with feature representation.
So the first layer could be a simple feature
like edges, then the next layer would use
those edges to detect simple shapes, and the
next one would use those shapes to detect
something even more complex like Kanye. These

Spanish: 
El algoritmo es el de "Aprendizaje por refuerzo" y hace poco se volvió Open-Source en Github
Solo toma 2 entradas:
Los pixels del juego
Y La Puntuación
Solo con eso cumplira su objetivo , conseguir una mejor puntuacion
Veamos como funciona, ya que vamos a recrear los resultados
Primero usa una "red neuronal convolucional"
Para interpretar los pixeles
Este tipo de red neuronal funciona como nuestra corteza visual
Con imágenes como entrada
Las imágenes son información muy compleja, tendremos que disminuir las neuronas
Para evitar la sobrealimentación
La sobrealimentacion es cuando el modelo es muy complejo
Muchos parámetros y se acostumbra casi exclusivamente a la información dada previamente
y tendrá problemas con información nueva
A diferencia de una red neuronal normal una red convolucional las capas están en 3 dimensiones
Lo cual facilita la conexión de las neuronas a las cercanas en vez de a todas
Cada capa es un filtro de detección para la presencia de características particulares
y las neuronas se vuelven más abstractas con cada futura representación
La primera capa puede ser una característica simple como bordes
y la próxima capa usara esos bordes para detectar formas
y la próxima las formas para detectar algo más complejo
Como Kanye West (Artista Musical)
Estas capas jerárquicas abstractas son los que las redes neuronales hacen muy bien

English: 
hierarchical layers of abstraction are what
neural nets do really well.
So once it's interpreted the pixels, it needs
to act on that knowledge in some way. In a
previous episode we talked about supervised
and unsupervised learning. But wait (there
is another and his name is john cena) its
called Reinforcement Learning. Reinforcement
learning is all about trial and error. Its
about teaching an AI to select actions to
maximize future rewards. Its similar to how
you would train a dog. If the dog fetches
the ball you give it a treat, if it doesn't
then you withhold the treat. So while the
game is running, at each time step, the AI
executes an action based on what it observes
and may or may not receive a reward. If it
does receive a reward, we'll adjust our weights
so that the AI will be likely to do a similar
action in the future. Q Learning is the type
of reinforcement learning that learns the
optimal action-selection behavior or policy
for the AI without having a prior model of
the environment. So based on the current game
state, like an enemy spaceship being in shooting
distance, the AI will eventually know to take
the action of shooting it. This mapping of
state to action is its policy and it gets
better and better with training. Deep Q also
uses something called experience replay, which
means the AI learns from the dataset of its
past policies as well. This is inspired by
how our hippocampus works, it replays past
experiences during rest periods, like when
we sleep.

Spanish: 
Muy Bien
Una vez que reconozca los pixels debe actuar de cierta manera
En el episodio anterior hablamos de aprendizaje supervisado y no supervisado
Pero esperen...
Y SU NOMBRE ES JOHN CENA!
Esto se llama "Aprendizaje por refuerzo"
El Aprendizaje por refuerzo se trata de prueba y error
Se trata de enseñar a tomar acciones para obtener resultados a futuro
Es como entrenamos a un perro
Si el perro atrapa la bola, le darás una recompensa
de lo contrario no se la darás
mientras el juego se esté ejecutando , IA tomará una acción basada en lo que observa
Y puede o no recibir una recompensa
si recibe una recompensa ajustaremos la AI para que repita una acción similar a futuro
el aprendizaje por refuerzo aprenderá la acción óptima sin requerir un modelo previo
Basado en el estado el juego, como una nave enemiga a una distancia para disparar
La AI tomará la acción de dispararla.
Esta acción es su táctica y se volverá mejor con su entrenamiento
También usaremos la experiencia de repetición
De manera tal que use la información de sus tácticas previas

Spanish: 
Crearemos nuestra AI en 10 líneas de python
Usando "Tensor Flow" (www.tensorflow.org) y "OpenAI Gym" (gym.openai.com)
Tensor Flow es una librería de google que usaremos para crear la red convencional
Y con GYM haremos el aprendizaje por refuerzo
Por si no escucharon OpenAL , es un laboratorio sin fines comerciales que busca
crear AGI con código abierto
Consiguieron un billon de dolares por gente como Elon Musk
Empezamos importando nuestras dependencias
"environment" nos ayudará a crear nuestro ambiente para el juego
en nuestro caso es "Space Invaders", pero podemos cambiar a distintos ambientes fácilmente
GYM es muy modular, OpenAL (Empresa creadora) quiere entrenar la AI para ser mejor
Puedes enviar tus algoritmos para ser evaluados y de darán una puntuación
Yo lo Apruebo!
Iniciaremos nuestro agente con GYM  para observar el juego en nuestra clase de entrenamiento
cuando tengamos las dependencias podremos inicializar nuestro ambiente (space invaders)
ahora nuestro agente tendrá los parámetros necesarios para entrenar

English: 
So we're going to build our game bot in just
10 lines of Python using a combination of
Tensorflow and Gym. Tensorflow is google's
ML library which we'll use to create the convolutional
neural net, and Gym is OpenAI's ML library
which we'll use to create our reinforcement
learning algorithm and setup our environment.
Oh, If you haven't heard, OpenAI is a non-profit
AI research lab focused on creating AGI in
an open source way. They've got a billion
bucks pledged from people like Elon Musk so
yeah. Elon Musk.
Let's start off by importing our dependencies.
Environment is our helper class that will
help initialize our game environment. In our
case, this will be space invaders, but we
can easily switch that out to a whole host
of different environments. Gym is very modular,
OpenAI wants it to be a gym for AI agents
to train in and get better. You can submit
your algorithm to their site for an evaluation
and they'll 'score' it against a set of metrics
server-side. The more generalized the algorithm,
the better -- and everybody's attempts can
be viewed online so it makes sharing and collaborating
a whole lot easier. I approve. We'll also
want to import our deep q network helper class
to help observe the game and our training
class to initialize the reinforcement learning.
Once we've imported our dependencies, we can
go ahead and initialize our environment. We'll
set the parameter to space invaders. and then
initialize our agent using our DQN helper
class with the environment and environment
type as the parameters. Once we have that

Spanish: 
entrenaremos con la función con el agente como parámetro
Esto poblara la memoria con 50000 jugadas
así tendremos un poco de experiencia
luego iniciara nuestra red convencional para leer los pixels
y nuestro aprendizaje por repetición para actualizar nuestros agentes para tomar decisiones
basado en los pixeles recibidos
esta es una implementación de agente-ambiente
Cada vez que haya una acción el ambiente determina una observación o recompensa
La observación son los píxeles, la cual daremos a nuestra red convencional
y la recompensa es un número que usaremos para mejorar nuestras acciones
esto lo usaremos para nuestra función en GYM
iniciaremos el entrenamiento con GYM con la función play del agente
Ahora podremos correr en consola y veremos a la AI intentando jugar
Al principio será gracioso y malo pero se pondrá mejor con el tiempo
La AI sera mas dificil de vencer cuanto más la entrenes y lo podrás aplicar a varios juegos
Para más información ve los links en la descripción
Y Suscribete para mas videos asi
Y ahora tengo que solucionar un problema de ejecución

English: 
we can start training by running the trainer
class with the agent as the parameter. First,
this will populate our initial replay memory
with 50,000 plays so we have a little experience
to train with. Then it will initialize our
convolutional neural network to start reading
in pixels and our Q learning algorithm to
start updating our agent's decisions based
on the pixels it receives. This is an implementation
of the classic "agent-environment loop". Each
timestep, the agent chooses an action, and
the environment returns an observation and
a reward. The observation is raw pixel data
which we can feed into our convolutional network,
and the reward is a number we can use to help
improve our next actions. Gym neatly returns
these parameters to use via the step function
which we've wrapped in the environment helper
class. During training, our algorithm will
periodically save the 'weights' to a file
in the models directory so we'll always have
a partially trained model at least.
Expect it to take a few days to fully train
this to human level. Once we've started training,
we can start the game with the play function
of our agent object. We can go ahead and run
this in terminal and the space invaders window
should pop up and we'll start seeing the AI
start attempting to play the game. It'll be
hilariously bad at first but will slowly get
better with time. (terminator) We can see
in terminal a set of metrics periodically
printed out so we can see how the agent is
doing as time progresses. The AI will get
more difficult to defeat the longer you train
it and ideally you can apply it to any game
you create. Video games and other simulated
environments are the perfect testing grounds
for building AI since you can easily observe
its behavior visually. For more info, check
out the links down below and please subscribe
for more machine learning videos. For now
i've gotta go fix a runtime error so thanks
for watching
