Python implementation of Fixed-Strategy Iteration CFR (FSICFR) playing Liar’s die
So I have been working through the introductory paper on CFR: An Introduction to Counterfactual Regret Minimization. One of exercises is to implement the Fixed-Strategy Iteration CFR (FSICFR) for the simplified Liar’s dice game - Liar’s die.
Liar’s die rules (see paper for more detail)
A player begins by rolling an s-sided die secretly, observing the roll, and claiming a roll rank. Then the opponent has two possible actions:
- Doubt: The opponent doubts the claim, the roll is revealed and compared against the rank claimed. If the roll rank is greater or equal to the claim rank, the player wins. Otherwise, the opponent wins.
- Accept: The opponent accepts the claim. Without revealing the prior roll rank, the die is given to the opponent, who now takes on the same role as the initial player with one exception. After the die is again secretly rolled and observed, one must make a claim that is higher than the previous claim. Thus, players will take turns secretly rolling the die and making successively higher claims until one player doubts their opponent’s claim, and the roll is checked to see who wins the challenge.
Example
Current | Roll | Type | Action | Explanation |
---|---|---|---|---|
Wong | 1 | claim | 1 | Wong claims rank 1 after rolling a 1. |
Chan | response | accept | Chan accepts Wong’s claim. | |
Chan | 4 | claim | 4 | Chan claims rank 4 after rolling a 4. |
Wong | response | doubt | Wong doubts Chan’s claim. | |
Chan wins since Chan’s claim <= Chan’s roll. |
Notes about implementation
The implementation can be found in the github repo. Since this is a finite, 2-player, zero-sum game, I also implemented the exploitability metric. This is computed as \(\epsilon_{\sigma} = b_1(\sigma_2) + b_2(\sigma_1)\) where \( b_i \) gives the best response value of player \( i \). This turned out to be the most time consuming part since the algorithm has to traverse the entire game tree twice including all the possible player rolls so that it can choose the best response at each decision point.
Results
The figures below show the results after 1M iterations. TBH, I am still in the process of understanding them because some of them are quite puzzling.
We first look at the claim node results. Most of the action probability for the high player rolls make sense where the best action is to claim the rank of the player roll. However, for the scenarios where the previous claims are 3 or 4 and the current roll is 6, the action probability is spread uniformly over both claim rank 5 and 6. One would think that the best action would simply be to claim rank 6 since the player had in fact gotten 6 and claiming rank 5 here would give the opponent the opportunity of doubting and thereby losing the game. I also thought perhaps it is because we never reached this claim node, but this is not true looking the other rolls and the response node results.
For the action probabilities of the lower player rolls, there is a tendency to “lie” and claim ranks that are higher than the player roll. This makes sense since claiming a low rank is almost a guaranteed loss since there is >50% that the opponent will get a higher roll. By claiming a higher rank, at least there is a chance that the opponent will accept and roll.
Moving onto the response node results. We can see that if the previous claim is rank 2 or higher, then the player will always doubt. This kind of make sense because there is a higher chance in these situation that the opponent is simply “bluffing” out of necessity. Interestingly, the player will opt to accept and try to roll a rank 6 if the opponent claims rank 5 at the initial roll. This shows that the player reasons that the opponent must be claiming the truth at these moments.
Lastly, the exploitability metric measured at intervals of 200K iterations shows that the starting player has a significant advantage to winning. This is not surprising given the response node results where we saw that the opponent will pretty much accept any claim at the initial roll and the claim node results which showed a tendency to bluff when the roll rank is low. Faced with an initial claim of rank >=3, the opponent really has little alternatives other than to accept since there is only 2/6 chance of the player bluffing. But after accepting the initial claim, it becomes <=50% chance of getting a roll that is higher than the initial claim.