Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
Today, the game we’ll be talking about is
the six-player no-limit Hold’em poker, which
is one of the more popular poker variants
out there.
And the goal of this project was to build
a poker AI that never played against a human
before and learns entirely through self-play,
and is able to defeat professional human players.
During these tests, two of the players that
were tested against are former World Series
of Poker Main Event winners.
And of course, before you ask, yes, in a moment,
we’ll look at an example hand that shows
how the AI traps a human player.
Poker is very difficult to learn for AI bots
because it is a game of imperfect information.
For instance, chess is a game of perfect information
where we see all the pieces and can make a
good decision if we analyze the situation
well.
However, not so much in Poker, because only
at the very end of the hand do the players
show what they have.
This makes it extremely difficult to train
an AI to do well.
And now, let’s have a look at the promised
example hand here.
We talked about imperfect information just
a moment ago, so I’ll note that all the
cards are shown face up for us to make the
analysis of this hand easier, of course, this
is not how the hands were played.
You see the AI up there marked with P2 sittin’
pretty with a Jack and a Queen, and before
the flop happens, which is when the first
three cards are revealed, only one human player
seems to be interested in this hand.
During the flop, the AI paired its Queen and
has a Jack as a kicker, which, if played well
is going to be disastrous for the human player.
So, why is that?
You see, the human player also paired their
queen, but has a weaker kicker and will therefore
lose to the AIs hand.
In this case, this player thinks they have
a strong hand and will get lots of value out
of it… only to find out that they will be
the one milked by the AI.
So, how exactly does that happen?
Well, look here carefully!
The bot shows weakness by checking here, to
which, the human player’s answer is a small
raise.
The bot, again, shows weakness by just calling
this raise, and checking again on the turn,
essentially saying “I am weak, don’t hurt
me!”.
By the time we get to the river, the AI, again,
appears weak to the human player, who now
tries to milk the bot with a mid-sized raise…
and, the AI recognizes that now is the time
to pounce, the confused player calls the bet
and gets milked for almost all their money.
An excellent slow play from the AI.
Now, note that one hand is difficult to evaluate
in isolation, this was a great hand indeed,
but we need to look at entire games to get
a better grasp of the capabilities of this
AI.
So if we look at the dollar-equivalent value
of the chips in the game, the AI was able
to win a thousand dollars from these 5 professional
poker players…every hour.
It also uses very little resources, can be
trained in the cloud for only several hundred
dollars, and exceeds human-level performance
within only 20 hours.
What you see here is a decision tree that
explains how the algorithm figures out whether
to check or bet, and as you see here, this
tree is traversed in a depth-first way, so
first, it descends deep into one possible
decision, and later, as more options are being
unrolled and evaluated, the probability of
these choices are updated above.
In simpler words, first, the AI seems somewhat
sure that checking would be a good choice
here, but after carefully evaluating both
decisions, it is able to further reinforce
this choice.
One of the professional players noted that
the bot is a much more efficient bluffer than
a human and always puts on a lot of pressure.
Now note that this is also a general learning
technique and is not tailored specifically
for poker, and as a result, the authors of
the paper noted that they will also try it
on other imperfect information games in the
future.
What a time to be alive!
This episode has been supported by Weights
& Biases.
Weights & Biases provides tools to track your
experiments in your deep learning projects.
It can save you a ton of time and money in
these projects and is being used by OpenAI,
Toyota Research, Stanford and Berkeley.
It is really easy to use, in fact, this blog
post describes how you can visualize your
Keras models with only one line of code.
When you run this model, it will also start
saving relevant metrics for you and here you
can see the visualization of the mentioned
model and these metrics as well.
That’s it.
You’re done!
It can do a lot more than this, of course,
and, you know what the best part is?
The best part is that it’s free and will
always be free for academics and open source
projects.
Make sure to visit them through wandb.com/papers
or just click the link in the video description
and sign up for a free demo today.
Our thanks to Weights & Biases for helping
us make better videos for you.
Thanks for watching and for your generous
support, and I'll see you next time!
