Researchers: Vili Jussila, Jonathan Schaeffer*,
Markian Hlynka* and Kimmo Kaski
*University of Alberta, Canada
The design and tuning of evaluation function is the most time-consuming part of building a high performance game-playing program. In this project we have investigated the possibility of making this arduous task easier by automating the generation of evaluation function weights.
We have used Temporal difference (TD) learning, which is a powerful reinforcement learning technique, to tune the weights by playing several thousand training games. In order to find out whether these learned weights can be successful in a high-performance (world-championship-caliber) program, we used Chinook (the World Man-Machine Checkers Champion) as our test program. Our results so far are very encouraging; the TD learned set of weights can achieve the same level of play as the Chinook's original weights, which were tuned over 5 years, involving a computer scientist and a checkers expert. In future we are going to extend the learning to automatically discover also the features needed for the evaluation function.
|Figure 14: Game results of white (solid line) and black (dotted line) learning players. Each data point represents the number of wins of the learner less the number of wins by Chinook in 100 games. The white player has an advantage over the black which is reflected in its score.|