Data Biased Robust Counter Strategies

Michael Johanson and Michael Bowling. Data Biased Robust Counter Strategies. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence and Statistics (AISTATS), 2009.



The problem of exploiting information about the environment while still being robust to inaccurate or incomplete information arises in many domains. Competitive imperfect information games where the goal is to maximally exploit an unknown opponent's weaknesses are an example of this problem. Agents for these games must balance two objectives. First, they should aim to exploit data from past interactions with the opponent, seeking a best-response counter strategy. Second, they should aim to minimize losses since the limited data may be misleading or the opponent's strategy may have changed, suggesting an opponent-agnostic Nash equilibrium strategy. In this paper, we show how to partially satisfy both of these objectives at the same time, producing strategies with favourable tradeoffs between the ability to exploit an opponent and the capacity to be exploited. Like a recently published technique, our approach involves solving a modified game; however the result is more generally applicable and even performs well in situations with very limited data. We evaluate our technique in the game of two-player, Limit Texas Hold'em.


Data Biased Response (DBR) is a practical algorithm for building models of opponents and counter-strategies to exploit them, when all we have is a set of observations of the opponent. In the earlier Restricted Nash Response (RNR) algorithm, we needed to have a complete strategy for the opponent. We could then vary one parameter, p, to produce different counter-strategies that trade off between exploitability (our worst-case performance) and exploitation (how much we beat the target opponent for). RNR solves a modified game between two players, where one player secretly is forced at the start of the game to follow the opponent's strategy with probability p. Otherwise, with probability 1-p, they are free to play as they choose. When we solve this game, the unrestricted player's strategy is a useful counter-strategy to use against the true opponent.

When we only have observations and not a complete strategy, though, RNR has to have a default policy for the opponent to follow. This can lead to overfitting behaviour, where the resulting counter-strategy works well against the model, but not against the real opponent. In the Data Biased Response technique, we instead move this random choice (forced to play like the model, or free to choose any action) inside the tree, and make this choice at each information set. We then vary p at each decision so that it is proportional to our confidence in the accuracy of the opponent model. In cases where we have many observations of the opponent's actions, we use a higher value of p. In the very frequent cases where we have never observed the opponent, we set p to zero and the opponent is free to choose any action. Effectively, this lets us fill in the gaps in our opponent model, by making the assumption that our opponent chooses actions that are most effective given the rest of their strategy and our own counter-strategy. If the opponent's strategy is flawed, then DBR will fill in these gaps with a more effective strategy than they actually use.

This property makes DBR robust to modelling noise and sparsity. If not enough data is provided, it defaults towards a Nash equilibrium. When observations are present, it moves towards exploitive strategies that also limit their worst-case loss. This is an effective combination in practice, and is currently our state-of-the-art technique.



@InProceedings( 2009-aistats-dbr,
  Title = "Data Biased Robust Counter Strategies",
  Author = "Michael Johanson and Michael Bowling",
  Booktitle = "Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS)",
  Year = "2009",
  Pages = "264--271",
  AcceptRate = "30\%",
  AcceptNumbers = "63 of 210"