In mid 2007, several months into building the first version of my poker bot, I decided to seek some help with the bot’s poker strategy. I was a pretty good at Heads Up Sit-n-Gos, but I didn’t have a lot of experience playing full ring games, which is what I was building the bot for. I posted a thread in a poker forum explaining that I had a working bot but that I needed help with strategy and that anyone who wanted to help should email me.
Several folks contacted me, but one stood out above the rest. This guy was not only an extremely talented player, but he also possessed a deep analytical ability that helped him understand why he made decisions he made. So with him guiding the strategy, we set out to build a profitable NL200 ($1/$2) six-max poker bot.
As you’ll see in a minute, he was brilliant. A genius. He knew his stuff inside and out. As I read over the first strategy email he sent me I couldn’t help but think that we were going to completely destroy the tables.
However, the more we worked, the more he realized how complex it was. It’s funny: you can teach someone to play decent poker in half an hour so it seems like it shouldn’t be that hard to teach a computer. After all, you have access to millions hands and more processing power that you could ever need–how hard could it be?
It’s pretty hard. If you doubt it, try writing down a winning strategy and giving it to someone who has never played poker before. Have them join a game and tell the person to follow your directions no matter what and never to deviate even when they think it makes sense to do so. One of the things you’ll quickly discover is that a lot of factors influence your decisions. At least they should. It’s not just your hole cards and the hand you make that you have to factor in: your position, the stack sizes, the action leading up to that point, the table dynamics, your opponents’ styles, the meta game, and a dozen other factors help you decide what to do. How do you tell a poker bot to make a decision based on all these factors? That’s the tricky. But if you don’t, you’re going to have a tough time building a winning full stack no limit bot.
He understood this well. Instead, he took a hybrid approach: some rules to identify the situation, and then a mostly quantitative approach to decide what action to take from there.
Below is one of his first emails, detailing his proposed preflop strategy. I’m posting it here because it’s not going to help anyone build a winning bot and I think a lot of folks will find it interesting. It can be a bit hard to follow at times, but if you’re a hardcore poker player you’ll probably get a kick out of it.
###
Preflop Play
Defining and recalling factors
A database will be composed giving the values of 4 independent variables for each starting holdem hand. The values of each variable will be recalled when a hand is dealt. These 4 variables will be given names as follows: SE, IO, RIO, and BHF.
Triggered formulas modifying and using the 4 factors
[Each of these formulas trigger every time conditions in italics are met. Variables are “declared” in where lines and apply to the whole code throughout the duration of one hand. “My variables” are extra variables in equations that are subjective and need to be modifiable by me. Stats are usually in words or acronyms in paraenthesis and should be recognizable, and recall a number already calculated and in memory.]
-Variables consisting of solely letters (A, B, C,…, Z), (ZA, ZB, ZC,…, ZZ), or (ZAA, ZAB, ZAC, …, ZZZ) are my variables.
-Variables containing letters YA through YZ are placeholder variables [used to add or subtract value to SE and IO on players’ exit from hands]
-Any time subscript x is used, it is a variable associated with player in x position [6 is utg 1 is BB]. Also, Y(1-6) = Y1 +Y2 +Y3 + Y4 +Y5 +Y6 . [In the actual code, you will probably have to figure out a different system than the subscript system I have here, and may have to write out each equation multiple times, but I leave it up to you how to translate this algorithm code.]
When a player raises
Where SRx is strength of the raise of the player in x position, defined below
SRx= (positional pfr of raiser)A + (size of the raise in BB)B+(SR(1-6) )ZW
YAx= (-C)(SRx^1.5)
SE=SE+YAx
YBx=D(SRx^1.5)
IO=YBx+IO
When a player is the first player to cold call a raise
Where CCx is strength of cold the call in x position, defined below
CCx= (positional vpip of caller)E + (% 3 bet preflop of caller)F + (SR(1-6))G
YRx= (CCx)H
IO=IO+YRx
If CCx>I
YCx= (– J)(CCx^1.5)
SE=SE+YCx
If CCx=I or CCx<I
YDx= K(CCx^1.5)
SE=SE+YDx
When a player over calls a raise [calls a raise that has been called by another player]
Where CC is strength of the cold call in x position, defined below
CCx= (positional vpip of this caller)E + (% 3 bet preflop of this caller)F + (SR(1-6))G – ZX(CC(1-6))
YQx= (CCx)H
IO=IO+YQx
If CCx>I
YCx= (– J)(CCx^1.5)
SE=SE+YCx
If CCx=I or CCx<I
YDx= K(CCx^1.5)
SE=SE+YDx
When a player open limps [limps when everyone who has already acted has folded]
Where SL is strength of limper, defined below
SLx=(positional vpip of limper)L+(positional pfr of limper)M
YE=(SLx)N
IO=IO+YEx
YFx=(SLx)O
SE=YFx+SE
If SLx>I
YGx= (–P)(SLx^1.5)
SE=YGx+SE
If SLx=I or SLx<I
YHx= Q(SLx^1.5)
SE=SE+YHx
When a player overlimps [limps after a player has already limped]
Where SOL is strength of overlimper, defined below
SOLx=(positional vpip of limper)L+(positional pfr of limper)M+R(SLx)
YJ=(SOLx)N
IO=YJx+IO
YKx=(SOLx)O
SE=YKx+SE
If SOLx>I
YLx=(–P)(SOLx^1.5)
SE=SE+YLx
If SOLx=I or SOLx<I
YMx= Q(SOLx^1.5)
SE=YMx+SE
When we are first to act or all players have folded to us
Where BS is blind steal, defined below, and XX is a random integer between 0 and 9
If (the big blind’s fold bb to steal) > .70
BS= ((bb’s fold bb to steal) – .70)(XX)S=YN
If( the big blind’s fold bb to steal) < .30
BS = (.30 – .(bb’s fold bb to steal))(XX)T=YO
YP= – (pos VPIP of sb)U + 1 – .2BS(number of players left to act)
BS=BS+YP
Any time a player in x position folds
IO=IO – .67(YBx+YRx+YQx+YEx+YJx)
SE=SE – .4(YAx +YCx +YDx +YFx+YGx+YHx+YKx+YLx+YMx)
{
Any time it is our first time to act EVALUATE ON EACH PLAYER WHO HAS NOT ACTED, NOT INCLUDING THE BLINDS, redeclaring variables each time
Where DCS is degree of calling station, defined below, DL is degree of LAG, defined below, DF is degree of folder, defined below
If (WTSD) > .37 and (Postflop AF) < 1.5 and (VPIP) > .30
DCS=(VPIP – .15)(2 – (Postflop AF))(WTSD – .1)V
SE=DCS(W)+SE
IO=DCS(X)+IO
If( WTSD) > .30 and (postflop AF )> 3.5 and (VPIP)> .25
DL=(WTSD)((postflop AF) – 2)(VPIP – .12)Y
SE=DL(Z)+SE
IO=DL(ZA)+IO
If WTSD < .19 and Fold flop to cbet > .70
DF=(.24-WTSD)(VPIP)((Fold flop to cbet)-.3)(ZB)
SE=SE+DF(ZC)
BHF=BHF+DF(ZD)
Any time it is our first time to act EVALUATE ON EACH PLAYER WHO HAS ALREADY ACTED AND IS STILL IN THE HAND, redeclaring variables each time
Where DCS is degree of calling station, defined below, DL is degree of LAG, defined below, DF is degree of folder, defined below
If (WTSD) > .37 and (Postflop AF) < 1.5 and (VPIP) > .30
DCS=(VPIP – .15)(2 – (Postflop AF))(WTSD – .1)ZE
SE=DCS(ZF)+SE
IO=DCS(ZG)+IO
If( WTSD) > .30 and (postflop AF )> 3.5 and (VPIP)> .25
DL=(WTSD)((postflop AF) – 2)(VPIP – .10)ZH
SE=DL(ZI)+SE
IO=DL(ZJ)+IO
If WTSD < .19 and Fold flop to cbet > .70
DF=(.24-WTSD)(VPIP)((Fold flop to cbet)-.3)(ZK)
SE=SE+DF(ZL)
BHF=BHF+DF(ZM)
Any time it is our first time to act EVALUATE ON EACH BLIND WHO HAS NOT ACTED, redeclaring variables each time
Where DCS is degree of calling station, defined below, DL is degree of LAG, defined below, DF is degree of folder, defined below
If (WTSD) > .37 and (Postflop AF) < 1.5 and (VPIP) > .30
DCS=(VPIP – .15)(2 – (Postflop AF))(WTSD – .1)ZN
SE=DCS(ZO)+SE
IO=DCS(ZP)+IO
If( WTSD) > .30 and (postflop AF )> 3.5 and (VPIP)> .25
DL=(WTSD)((postflop AF) – 2)(VPIP – .12)ZQ
SE=DL(ZR)+SE
IO=DL(ZS)+IO
If WTSD < .19 and Fold flop to cbet > .70
DF=(.24-WTSD)(VPIP)((Fold flop to cbet)-.3)(ZT)
SE=SE+DF(ZU)
BHF=BHF+DF(ZV)
}
For the bracketed area, whenever an “if” section is activated, keep track of how much our SE, BHF, and IO values are being modified. When the player who activated the “if” section folds, reverse 2/3 of the change to the IO, SE, or BHF value that was made. [For example, if a player has really fishy stats in the cutoff and adds DCS(W) to our SE, if that player folds preflop, subtract 2/3 DCS(W). Note however that the variables (DCS) and (W)’s values can change by the time he folds, but the 2/3 the actual amount added to SE needs to be subtracted.]
Any time there is (dead money) in the pot
Define (dead money) as all money players have put into the pot who have folded, any money not attached to a hand [such as in a dead sb when a player posts out of position who was at the table], and any money we have previously put in the pot when it is again our turn to act.
(1+(dead money) / (2(size of the pot)))SE
Any time it is our turn to act and we are outside the blinds
If it is our 1st time to act XY=1
If we raised our 1st time to act, and it is our second time to act, XY=.5
else XY=0
If (ZY(SE)+ZZ(IO)+ZAA(BHF)+ZAB(BS)ZY- (ZAC)RIO)>ZAD
Raise
If (ZAE(SE)+ZAF(IO)+ZAG(BHF)-ZAH(RIO))>ZAI
Call
Else
Fold
Any time it is our turn to act and we are inside the blinds
If it is our 1st time to act XY=1
If we raised our 1st time to act, and it is our second time to act, XY=.5
else XY=0
If (ZAJ(SE)+ZAK(IO)+ZAL(BHF)+ZAM(BS)ZY-(ZAO)RIO)>ZAP
Raise
If (ZAQ(SE)+ZAR(IO)+ZAS(BHF)-ZAT(RIO))>ZAU
Call
Else
Fold
[This is the end of the algorithm part I have been working on. What follows is the text I wrote before I started writing the actual algorithms pertaining to preflop stuff I need to turn into algorithms. This is all unedited and unchanged since I started work on the real algorithms, and hence may be incorrect.]
Bet sizing will be based on size of the pot with modifiers. We will raise less in position and more out of position at a constant rate. There will also be a semi-random component based on our SE and IO. Hands that have high IO and/or SE, with small consideration for BHF, will be raised a slightly larger portion of the pot. I envision taking the higher of the 2 values and adding 1/3 of the other, and then adding a reduced BHF. This should scale with a random variable such that hands with the lowest values and latest position are raised to 3bbs 60% of the time, 3.5bbs 26.6% of the time, and 13.3% 4bbs. For hands with our highest value, it should be 4bbs 100%. For hands with average values, say like KJs from mid position, the distribution should be close to even but tilted slightly towards 4bb. Our average 3bet sizes should be about 15% less than the size of the pot in position and 10% more than the size of the pot out of the bb. All these are for 100bbs. Our raise sizes decrease as size of the pot relative to stack sizes decrease. The effect should be about 10% from 50bb to 100bb and 10% more from 100bb to 200bb for similar sized pots.
With some tweaking, this should be able to reproduce the slag style. Our preflop play will become more advanced and correct as we add to and tweak section 1b.
A note on filters: It is important that we play small pairs correctly preflop. Small pairs are unique because their value lies in the 1/9 chance we flop a set. There are some basic rules for this, like we should always call when the bet to us is 1/15 of stacks or better. Of course, a ton of things modify this rule, especially the implied odds of the opponents’ play style. I have hoped we could model correct low pocket pair play preflop without having to add an extra “filter”. There are a number of things like this throughout our play of a hand, special circumstances that should be added to normal logic of play. I will try to write as few filters as possible, trying to reduce everything to its lowest common denominator. I am still unsure if we will be able to model correct low PP play without a filter. In any case, these filters should be the last thing we write.
Section 2
Instead of only future modeling which takes place in Section 1, Section 2 will combine some modeling with basic hand reading and immediate equity calculations.
Section 2 will take over when the size of the bet we are facing is greater than or equal to 20% of effective stacks of the lowest stack player who has called or made that bet, or us if we were to call the bet, or a player who has called another bet (not the bet we are facing) that is 35% of his effective stack and he has not folded yet. It will also take over if we plan to make a raise that would be about 12% stack of a 200bb player, 20% stack of a 100bb player, or 35% stack of a 20bb player who has called the latest raise. (note that later I plan to make this scale with stack sizes in BB as well as %of stack)
We will take the action with the highest expected value (EV).
EV folding is 0
EV calling is (equity)*(size of pot after call)*TIO – (amount to call) – (chance we will have to fold this round*amount to call)
EV raising is (Equity when called)*(size of pot when called)*TIO + (% chance all fold)* (size of the pot with our raise) – (amount costs us to raise) – (EV loss when we have to fold after we raise)
We will assign hand ranges to each relevant (defined later) player in the hand by taking the % of time they perform that action for raising (positional pfr for 1st raises, 3 betting graduated for position for 2nd raises) and using our predefined raising table, and 1/2raise%-call% for players who have called.
Thanks for sharing, your so right when you say along the lines of ‘seems simple on the surface, but not when you try to implement it’
I got to the end and thought wow there’s so many variables, equations and decisions which go into poker.
Then I realized that this logic decides only 1 decision in 1 variety of poker and that just blew my mind.
To bad this guy isn’t as good of a programming as he is cash game player however.
Jason, yeah, it’s a doozy.
The mind-blowing thing is that human players don’t make decisions like this. They weigh factors, sure, but they don’t perform complex weighted averages in their head to decide whether to call or fold or whatever, which brings about a more fundamental question: how *do* we make decisions?
The task is seems not so easy to solve using ordinary if then logic, even using weights in formulas.
The more I think about the problem the less I understand how to deal with it. However I’m going to check if I could write an easier sub models (like playing HU SNG, 9max SNG) for the bot. There is much less postflop play decisions and this could work at least at low-stakes.
When you talk about different factors used by human to make a decision – it seems that the task could be solved using basic AI algos, but it should be somehow optimized to use limited amount of initial knowledge
I think “how *do* we make decisions?” is the wrong question to ask in relation to creating a winning bot strategy while maintaining your sanity. Each individual has his own unique decision making process based on mental models formed over time. That’s the beauty of the human mind.
As I was reading the email, I couldn’t help but be overwhelmed by the complexity of the proposed algorithm. If I was tasked with constructing a winning $200NL full-ring strategy, I would try to simplify things from the start. Humans are good imitators. Instead of trying to create an sLAG strategy from scratch, I would try to mimic the nitty winning strategies being employed by mass-multitablers. I chose the criteria (nitty, winning, mass-multitabler) because I suspect the strategies will be inherently robotic.
Anyways, thank you for continuing to post about your botting endeavors!