Strategy Evaluation in Extensive Games with Importance Sampling

Michael Bowling, Michael Johanson, Neil Burch and Duane Szafron. Strategy Evaluation in Extensive Games with Importance Sampling.. In Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML), 2008.



Typically agent evaluation is done through Monte Carlo estimation. However, stochastic agent decisions and stochastic outcomes can make this approach inefficient, requiring many samples for an accurate estimate. We present a new technique that can be used to simultaneously evaluate many strategies while playing a single strategy in the context of an extensive game. This technique is based on importance sampling, but utilizes two new mechanisms for significantly reducing variance in the estimates. We demonstrate its effectiveness in the domain of poker, where stochasticity makes traditional evaluation problematic.


This paper introduces a variance reduction technique with some very useful properties. In its simplest form, we can look at a completed set of games between one of our programs (where we know the strategy completely) and a static but otherwise unknown opponent. For each game, we can think about other, imaginary games that would have looked the same at each of the opponent's decisions. For example, situations where we got dealt different cards, or folded earlier in the game, or called on the river. If we were dealt different cards, so long as we had some probability of taking the same actions, the opponent would not have been able to tell the difference and thus would have acted the same way. For each actual game that was played, we can find hundreds of these imaginary observations, and use importance sampling to weight them together to form an unbiased, low-variance estimate of our performance on that hand. It can be combined with other variance reduction techniques, such as DIVAT or MIVAT, or duplicate matches, to get even better estimates.

The algorithm also allows for other cases. In the off-policy setting, we can observe strategy A playing against an opponent, and get estimates of how well strategy B would do. This works so long as A and B have the same support: if both strategies either have zero or nonzero probability of reaching each game state. If one strategy can reach a state and the other cannot, then bias is introduced. The imaginary observations technique can greatly reduce this bias, as it requires only that the strategies reach the same states as viewed by the opponent. They can reach a state with different sets of private cards, so long as they both have nonzero probability of reaching a "public state" given by the betting and public cards. In our experiments, we found that for strategies with similar but not identical support, the technique still allows for low-variance and low-bias estimates. B's expected value can be estimated more precisely (measured by Root Mean Squared Error) by observing A and using this technique than by observing B directly without using variance reduction tricks.

The technique can also be used online, to estimate the performance of one or multiple strategies as a game is being played. In the online setting, a very small bias is introduced (far less than the variance), as we don't always know the opponent's cards and can consider imaginary cases where our own cards overlap theirs. In the online setting, we can estimate the utility of each strategy in a set of candidates, and for each game, switch to the one with the highest expected value. We used this technique in the Second Man-vs-Machine Poker Championship.



@article{ 2008-icml-imaginary-observations,
   author = {Michael Bowling and Michael Johanson and Neil Burch and Duane Szafron},
   title = {Strategy evaluation in extensive games with importance sampling},
   booktitle = {Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008)},
   location = {Helsinki, Finland},
   editor = {Andrew McCallum and Sam Roweis},
   publisher = {Omnipress},
   year = {2008},
   pages = {72--79}