Finding Optimal Abstract Strategies in Extensive Form Games
Michael Johanson, Nolan Bard, Neil Burch and Michael Bowling. Finding Optimal Abstract Strategies in Extensive-Form Games. In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI), 2012.Download
Abstract
Extensive-form games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensive-form games and, in two-player zero-sum games, there are efficient algorithms for calculating such strategies. In large games, this computation may require too much memory and time to be tractable. A standard approach in such cases is to apply a lossy state-space abstraction technique to produce a smaller abstract game that can be tractably solved, while hoping that the resulting abstract game equilibrium is close to an equilibrium strategy in the unabstracted game. Recent work has shown that this assumption is unreliable, and an arbitrary Nash equilibrium in the abstract game is unlikely to be even near the least suboptimal strategy that can be represented in that space. In this work, we present for the first time an algorithm which efficiently finds optimal abstract strategies --- strategies with minimal exploitability in the unabstracted game. We use this technique to find the least exploitable strategy ever reported for two-player limit Texas hold'em.
Notes
Since 2007, most of the successful computer poker agents have shared one common technique: abstract game Nash equilibrium approximation. Our goal is to find a close approximation to a Nash equilibrium strategy, but even small poker games like heads-up limit Texas hold'em are far too large to solve. Instead, a state-space abstraction process is used to derive a small game that can be tractably solved. This small game is then solved (using CFR, EGT, fictitious play, a linear program, etc) to approximate an optimal strategy for the small game. This abstract game equilibrium can then be used to choose actions in the large game.
This approach has two flaws. First, the state-space abstraction process is lossy: some strategically relevant information in the real game will not be perfectly represented in the abstract game. This means that we may not be able to closely approximate a Nash equilibrium when we are constrained to pick a strategy in the small game. Second, and more troubling, abstract game Nash equilibria might not be the abstract game strategies with the lowest real game exploitability. Kevin Waugh examined this effect in his paper on Abstraction Pathologies in small poker games, and we further examined this effect in our paper on Accelerated Best Response computation. In our best response paper, we found that running CFR on an abstract game and evaluating the strategy in the real game shows an overfitting effect. As we run CFR, the strategies quickly get less exploitable in the real game, but then start to become more and more exploitable in the real game until a plateau. At the same time, these strategies continue to improve in one-on-one play against a static opponent.
The overfitting effect is due to differences between the abstract game and the real game. If the abstraction is good, then the two games will largely be strategically similar, but will still have differences. As CFR runs, the strategies get better in the abstract game, and that also makes them better in the real game. However, CFR soon starts making adjustments that make the strategies better in the abstract game, but are not appropriate in the real game.
In his Abstraction Pathologies work, Kevin Waugh found just one setting in which abstraction pathologies are guaranteed not to occur, and this setting also prevents the overfitting effect. In a game where one player uses abstraction and the other does not, if we find a Nash equilibrium, the abstracted player's strategy will have the lowest exploitability of any strategy that it is possible to represent inside that abstraction. Note that this abstract strategy may not itself be an abstract game Nash equilibrium: it may (and in practice, does) lose to an equilibrium in its own abstract space. However, it may still be far less exploitable by a best response in the real game.
The trouble with this approach is that solving a game where even just one player does not use abstration was thought to be computationally infeasible. However, in this paper, we present an algorithm that does exactly this, and works well in practice on modern hardware. The new approach, called CFR-BR, combines the Public Chance Sampled CFR algorithm with the Accelerated Best Response technique to form a new game solving algorithm.
In CFR-BR, each iteration involves a game played between an abstracted CFR agent and an unabstracted opponent who always uses a best response to the CFR player. Each iteration has two steps: compute a Best Response for the opponent given the CFR player's strategy, and then do a CFR update to the CFR player using that opponent's best response strategy. As described so far, this approach will still be computationally intractable: representing the unabstracted opponent's strategy will still require too much memory. However, a useful property of best responses is that they can be computed in subgames, unlike CFR strategies which must be computed for the whole tree at once. On each iteration, we can then sample a subgame, for example by sampling a set of flop cards and considering a betting sequence to reach the flop. We can then compute a best response in only this subgame, which requires only a few gigabytes of RAM, and upate the CFR player in the subgame. To bind the subgames together, the unabstracted opponent will use CFR (without abstraction) to learn a strategy from the root of the game to the start of the subgames. This allows us to have a completely unabstracted opponent, without ever having to store their entire strategy at once. The abstracted CFR player's strategy will then converge to the least exploitable strategy that can be represented in the abstraction, and overfitting and abstraction pathologies are avoided entirely.
To empirically test our algorithm, we considered two abstractions of the same size that we used in the 2007 Computer Poker Competition. CFR-BR was able to find strategies that were about 1/3 as exploitable as those found using CFR, and were less exploitable than CFR strategies for an abstraction 100x larger. We also used CFR-BR on the abstraction we used in the 2011 and 2012 Annual Computer Poker Championships to find a strategy exploitable for just 41 mbb/g, one of the first strategies of any kind to break past 100 mbb/g (along with Eric Jackson's Slumbot), and to date, the least exploitable strategy ever published.
Links
- ACPC Forum thread
- The Accelerated Best Response paper, which describes how to find the strategy for the unabstracted opponent, and demonstrates the overfitting effect for the first time.
- The Public Chance Sampled CFR paper, which describes the CFR update rule we use.
- The Abstraction Pathologies paper, which demonstrates how CFR will find suboptimal strategies when abstration is used for both players.
- k-of-N measures, exciting work by Kit Chen and Michael Bowling that uses CFR-BR to design diabetes treatment policies.
BibTeX
@InProceedings( 2012-aaai-cfrbr,
Title = "Finding Optimal Abstract Strategies in Extensive Form Games",
Author = "Michael Johanson and Nolan Bard and Neil Burch and Michael Bowling",
Booktitle = "Proceedings of the Twenty-Sixth Conference on
Artificial Intelligence (AAAI)",
Year = "2012",
AcceptRate = "26\%",
AcceptNumbers = "294 of 1129"
)