Accelerating Best Response Calculation in Large Extensive Games


Michael Johanson, Kevin Waugh, Michael Bowling and Martin Zinkevich. Accelerating Best Response Calculation in Large Extensive Games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 2011.

Download


Abstract

One fundamental evaluation criteria of an AI technique is its performance in the worst-case. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal is infeasible to perform on modern hardware. In this paper, we detail a general technique for best response computations that can often avoid a full game tree traversal. Additionally, our method is specifically well-suited for parallel environments. We apply this approach to computing the worst-case performance of a number of strategies in heads-up limit Texas hold'em, which, prior to this work, was not possible. We explore these results thoroughly as they provide insight into the effects of abstraction on worst-case performance in large imperfect information games. This is a topic that has received much attention, but could not previously be examined outside of toy domains.


Notes

This paper introduces a fast best response algorithm that lets us find out how much our poker strategies can lose, on average, against a worst-case opponent that knows our strategy. A Nash equilibrium strategy would tie against such a worst-case opponent, losing $0/game on average. Most of our poker research revolves around approximating Nash equilibrium strategies, to try to limit this worst-case loss and hopefully drive it very close to zero.


While approximating Nash equilibrium strategies had been our goal since 2003, actually measuring our progress towards this goal was computationally difficult. The standard best response computation technique, called expectimax, involves a simple tree walk that visits each game state just once, and seems like it should be efficient. But with the 10^18 game states in Heads-Up Limit Texas Hold'em, even if we could visit three billion states per second (the clock speed on a modern CPU), the computation would still take ten years just to evaluate a single strategy.


While our programs appeared to be getting stronger every year (as they beat the previous years' programs in one-on-one matches), we had no guarantee that we were actually getting closer to Nash equilibrium over time, and no hard numbers to suggest how close any of our strategies were. Instead, we were forced to rely on two measures: one-on-one performance of our strategies playing against each other in actual poker games, or abstract game best response: our distance to equilibrium inside of a state space abstraction. Since CFR converges quickly to an abstract game Nash equilibrium, this second measure was typically very close to zero regardless of the abstraction.


In late 2010, we developed a new, faster best response computation that internally we call Real Game Best Response (RGBR) to differentiate it from abstract game best response. RGBR does the same type of tree walk as the standard best response computation, but calculates the expected values and actions for all of one player's private hands at the same time, while considering all possible private hands that the opponent can hold. This lets us reuse parts of the computation, and use a fast evaluation at terminal nodes. The computation then takes just 76 CPU-days instead of several years. We can then run the computation in parallel across 72 or more CPUs to get results in a day or less.


With this new algorithm, we were able to revisit all of our research since 2006 and measure how close we were getting to Nash equilibrium over time. As we had hoped, our strategies were indeed steadily getting closer to Nash equilibrium over time, as we solved larger and better abstract games. We were not sure if we could expect this; earlier work by Kevin Waugh had discovered abstraction pathologies, cases where a seemingly "better" abstraction could result in much more exploitable strategies.


In hard numbers, we found that in the 2007 Man-vs-Machine match that we lost by a small margin, Polaris was beatable for 275.88 milliblinds per game. In 2008 for the second Man-vs-Machine match, Polaris won by a small margin and was beatable for 235.294 milliblinds per game: a decent improvement. By the 2010 ACPC, our program Hyperborean was beatable for just 135.427 mbb/g, and by 2011 we were down to 104.41 mbb/g. In 2012, using our CFR-BR game solving algorithm (which is a combination of CFR and RGBR), we have gotten as low as 37 mbb/g.


In the paper, we also cooperated with the authors of several of the 2010 Annual Computer Poker Competition entries to measure their worst-case performance. Here are their results (also in the paper) and some trivial strategies to give context to the numbers.


First, some trival agents:

AgentExploitability
Always-Fold750
Always-Call1163.48
Always-Raise3697.69
Uniform Random4400.89
50% Call, 50% Raise2598.11

2010 ACPC agents:

AgentExploitability
Hyperborean.IRO135.427
Hyperborean.TBR141.363
GGValuta237.330
Rockhopper300.032
GS6.IRO318.465
PULPO399.387
Littlerock421.850

Note that Hyperborean.IRO came in third in this competition, after Rockhopper in first and GGValuta in second. Limiting exploitability is a nice worst-case guarantee, but it doesn't mean that the strategy will win in a one-on-one match.


For the 2012 Computer Poker Symposium at AAAI, we repeated the experiment and evaluated some of the 2012 competitors. Their results are below. In 2012, our entry 'Hyperborean' used the same abstraction as in 2011, and the only change was that we ran CFR for longer. At AAAI 2012 we also presented the CFR-BR paper, and our least exploitable strategy was down to 41 mbb/g.


2012 ACPC agents:

AgentExploitability
Hyperborean106.035
ZBot249.384
Littlerock324.216
Slumbot90.8733

Links

  • ACPC Forums thread where I can answer questions about this paper.
  • A 2+2 Forums thread where I discuss the paper and the results, and answer more questions about our research. I post under the name 'FullyCompletely' on 2+2, and my posts start at the top of Page 5.

BibTeX

@InProceedings( 2011-ijcai-abr,
  Title = "Accelerating Best Response Calculation in Large Extensive Games",
  Author = "Michael Johanson and Michael Bowling and Kevin Waugh and Martin Zinkevich",
  Booktitle = "Proceedings of the Twenty-Second International Joint 
               Conference on Artificial Intelligence (IJCAI)",
  Year = "2011",
  Pages = "258--265",
  AcceptRate = "30\%",
  AcceptNumbers = "400 of 1325"
)