Efficient Nash Equilibrium Approximation through Monte Carl Counterfactual Regret Minimization

Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson and Michael Bowling. Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization. In Proceedings of the Eleventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.

One of three finalists for the Best Paper award at AAMAS 2012.



Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in two-player zero-sum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chance-sampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chance-sampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O(n^2) work at terminal nodes in O(n) time. We demonstrate that this new CFR update converges more quickly than chance-sampled CFR in the large domains of poker and Bluff.


Counterfactual Regret Minimization (CFR) is a state-of-the-art algorithm for solving (i.e., finding Nash equilibria) very large extensive games. The original CFR paper introduced a deterministic CFR algorithm that doesn't use sampling, and a "poker-specific" sampling technique, now called Chance Sampling, that provably converges, and in practice converges very quickly.

In this paper, we introduce three new sampling variants of CFR: Self Public Chance Sampling (SPCS), Opponent Public Chance Sampling (OPCS) and Public Chance Sampling (PCS). These techniques do less sampling than Chance Sampling and do more work per iteration, more accurate work per iteration, or both. In particular, we show that PCS can converge much more quickly than CS in some games, including poker games such as heads-up limit Texas hold'em, and in the dice game Bluff.

This paper should be a good starting point for people reading about CFR for the first time, and a good resource for people that would like to make a first implementation. The pseudocode given in this paper is clearer than in the original CFR paper.



@COMMENT The Public Chance Sampling CFR (PCS) paper
@COMMENT Compares chance sampling (CS), PCS, and SPCS and OPCS
@COMMENT in heads-up limit games.  Introduces [2-1] and [2-4] 
@COMMENT hold'em for the first time.
@InProceedings( 2012-aamas-pcs,
  Title = "Efficient Nash Equilibrium Approximation through
           Monte Carlo Counterfactual Regret Minimization",
  Author = "Michael Johanson and Nolan Bard and Marc Lanctot and 
            Richard Gibson and Michael Bowling",
  Booktitle = "Proceedings of the Eleventh International Conference 
               on Autonomous Agents and Multi-Agent Systems (AAMAS)",
  Year = "2012",
  AcceptNumbers = "137 of 671",
  AcceptRate = "20\%"