Regret Minimization in Games with Incomplete Information


Martin Zinkevich, Michael Johanson, Michael Bowling and Carmelo Piccione. Regret Minimization in Games with Incomplete Information. In Advances in Neural Information Processing Systems 20 (NIPS 2007), pages 1729-1736, 2008.

Download


Abstract

Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold’em with as many as 10^12 states, two orders of magnitude larger than previous methods.

Notes

This was the first paper to introduce Counterfactual Regret Minimization (CFR), which quickly became the foundation of all of the University of Alberta Computer Poker Research Group's later work. This algorithm provides an efficient approach for approximating Nash equilibrium strategies in very large extensive games, allowing us to tackle much larger games than was possible using the standard linear programming technique. Strategies generated with CFR were used in the 2007 Annual Computer Poker Competition and 2007 Man-vs-Machine Poker Championship matches.


This paper is also described as part of my Masters Thesis. In that work, I attempted to describe the algorithm's theory and implementation in a less formal but more intuitive way. It also describes the state-space abstraction process that is required to simplify Texas Hold'em games to the point where CFR can be applied, and thus gives an end-to-end view of the steps that are required to make a strong poker agent.


If you are interested in implementing CFR, I would suggest reading our more recent work on Public Chance Sampled CFR, as it gives a more intuitive explanation of optional sampling schemes and provides a much more useful pseudocode description of the algorithm. The pseudocode in this paper isn't very useful, and doesn't mention the critical step of updating the average strategy.


Links

  • ACPC Forums thread where I can answer questions about this paper.
  • My MSc Thesis, which summarizes the CFR and RNR papers.
  • The Monte-Carlo CFR (MCCFR) paper by Lanctot et al., which generalizes the "poker specific" sampling technique in this paper to a broad family of sampling techniques. This gives us many ways to tune CFR for faster convergence, depending on the game that we're applying it to.
  • The Public Chance Sampled CFR paper, which is a variant of MCCFR that converges even faster in many games, including heads-up limit Texas hold'em. I think this paper has the best description of CFR for someone that wants to implement it.

BibTeX

@incollection{ 2007-nips-cfr,
  title = {Regret Minimization in Games with Incomplete Information},
  author = {Martin Zinkevich and Michael Johanson and Michael Bowling and Carmelo Piccione},
  booktitle = {Advances in Neural Information Processing Systems 20 (NIPS)},
  editor = {J.C. Platt and D. Koller and Y. Singer and S. Roweis},
  publisher = {MIT Press},
  address = {Cambridge, MA},
  pages = {1729--1736},
  year = {2008}
}