We advance the field of research involving modeling opponents in interesting adversarial environments: environments in which equilibrium strategies are intractable to calculate or undesirable to use. We motivate the need for opponent models by showing how successful opponent modeling agents can exploit non-equilibrium strategies and strategies using equilibrium approximations. We examine the requirements for an opponent modeling agent, including the desiderata of a good model.
We develop a new measurement which can be used to quantify how well our model can predict the opponent’s behavior independently from the performance of the agent in which it resides. We show how this metric can be used to find areas of model improvement that would otherwise have remained undiscovered and demonstrate the technique for evaluating opponent model quality in the poker domain. The measurement
can also be used to detect occasions when an opponent is not playing an equilibrium strategy, indicating potential opportunities for exploitation.
We introduce the idea of performance bounds for classes of opponent models, present
a method for calculating them, and show how these bounds are a function of only the environment and thus invariant over the set of all opponents an agent may face. We calculate the performance bounds for several classes of models in two domains: high card draw with simultaneous betting and a new simultaneous-move strategy game we
developed. We describe how the performance bounds can aid selection of appropriate classes of models for a given domain as well as guide the level of effort that should be applied to developing opponent models in those domains.
We expand the set of opponent modeling methods with new algorithms and study their performance empirically in several domains, including full scale Texas Hold’em poker. We explore opponent modeling improvement methods useful when the set of opponents we may face is unknown or unavailable. Using these techniques, we develop PokeMinn, an agent that learns to improve its performance by observing the opponent, even when the opponent is attempting to approximate equilibrium play. These methods also pave the way for performance optimization using genetic algorithms and efficient model queries using metareasoning.