NashCoding Yet Another Artificial Intelligence Blog

24Jul/109

Tutorial – Evolving Neural Networks with SharpNEAT 2 (Part 2)

In part 1 of the tutorial, we setup a basic experiment to evolve a neural network to play Tic-Tac-Toe against a couple of hand-coded opponents. In part 2, we're going to create a competitive coevolution experiment where the networks evolve by playing against themselves.

Competitive coevolution is an approach to evolving individuals by competing them against each other and using relative fitness scores. This approach is particularly useful when you have a domain where writing a hand-coded opponent is prohibitively expensive. While games like Tic-Tac-Toe may be simple enough to write optimal players, once you get to games that are more complex like Checkers or Chess, it becomes almost impossible to write a scripted expert player.1

The current SharpNEAT release does not include any support for coevolution, so we're going to build it from scratch.2

The ICoevolutionPhenomeEvaluator

If you recall from Part 1, we created a custom implementation of the IPhenomeEvaluator. This evaluator played a black box against both a random player and an optimal player then returned a fitness score. We'll stick to the same design by creating the ICoevolutionPhenomeEvaluator:

/// <summary>
/// Represents an evaluator that competes two phenomes against each other.
/// </summary>
public interface ICoevolutionPhenomeEvaluator<TPhenome>
{
    ...
    /// <summary>
    /// Evaluate the provided phenomes and return their fitness scores.
    /// </summary>
    void Evaluate(TPhenome phenome1, TPhenome phenome2, 
                       out FitnessInfo fitness1, out FitnessInfo fitness2);
}

Classes implementing this interface define an Evaluate method that competes phenome1 against against phenome2, setting their respective fitness scores at the end.

A Parallel Coevolution Genome List Evaluator

The standard approach in two-player competitive coevolution is to play round-robin style (i.e., exhaustively compete every possible pair of individuals). Scores are then aggregated together into a final fitness score for each individual. To do this, we need to create a custom implementation of IGenomeListEvaluator.

public class ParallelCoevolutionListEvaluator<TGenome, TPhenome>
      : IGenomeListEvaluator<TGenome>
{
    readonly IGenomeDecoder<TGenome, TPhenome> _genomeDecoder;
    readonly ICoevolutionPhenomeEvaluator<TPhenome> _phenomeEvaluator;
    readonly ParallelOptions _parallelOptions;
    ...
    /// <summary>
    /// Main genome evaluation loop with no phenome caching (decode 
    /// on each evaluation). Individuals are competed pairwise against
    /// every other in the population.
    /// Evaluations are summed to get the final genome fitness.
    /// </summary>
    public void Evaluate(IList<TGenome> genomeList)
    {
        //Create a temporary list of fitness values
        FitnessInfo[] results = new FitnessInfo[genomeList.Count];
        for (int i = 0; i < results.Length; i++)
            results[i] = FitnessInfo.Zero;

        // Exhaustively compete individuals against each other.
        Parallel.For(0, genomeList.Count, delegate(int i)
        {
            for(int j = 0; j < genomeList.Count; j++)
            {
                // Don't bother evaluating inviduals against themselves.
                if (i == j)
                    continue;

                // Decode the first genome.
                TPhenome phenome1 = _genomeDecoder.Decode(genomeList[i]);

                // Check that the first genome is valid.
                if (phenome1 == null)
                    continue;

                // Decode the second genome.
                TPhenome phenome2 = _genomeDecoder.Decode(genomeList[j]);

                // Check that the second genome is valid.
                if (phenome2 == null)
                    continue;

                // Compete the two individuals against each other and get
                // the results.
                FitnessInfo fitness1, fitness2;
                _phenomeEvaluator.Evaluate(phenome1, phenome2, 
                                                       out fitness1, out fitness2);

                // Add the results to each genome's overall fitness.
                // Note that we need to use a lock here because
                // the += operation is not atomic.
                lock (results)
                {
                    results[i]._fitness += fitness1._fitness;
                    results[i]._alternativeFitness += 
                                                  fitness1._alternativeFitness;
                    results[j]._fitness += fitness2._fitness;
                    results[j]._alternativeFitness += 
                                                  fitness2._alternativeFitness;
                }
            }
        });

        // Update every genome in the population with its new fitness score.
        for (int i = 0; i < results.Length; i++)
        {
            genomeList[i].EvaluationInfo.SetFitness(results[i]._fitness);
            genomeList[i].EvaluationInfo.AlternativeFitness =
                                                  results[i]._alternativeFitness;
        }
    }
}

We start by creating an array of fitness scores that will store the overall fitness of each genome. We then do a parallel nested for-loop over the population.3 Each individual is decoded from a genome to its phenotypic representation, and the two phenomes are evaluated against each other. Once the evaluation is finished, we update the genomes' overall fitness scores. When all evaluations are done, we set the genome's final fitness scores.

We've now built a general two-player competitive coevolution extension for SharpNEAT. It's time to get back to our specific problem domain.

Creating a Coevolution Evaluator for Tic-Tac-Toe

Our coevolution evaluator will take two black boxes and compete them in a game of Tic-Tac-Toe, with 10 points for a win, 1 point for a draw, and 0 points for a loss.4 The code is actually pretty simple:

public class TicTacToeCoevolutionEvaluator : ICoevolutionPhenomeEvaluator
{
    ...

    /// <summary>
    /// Evaluate the two black boxes by playing them against each other in a
    /// game of Tic-Tac-Toe. All permutations of size 2 are going to be
    /// evaluated, so we only play one game: box1 is X, box2 is O.
    /// </summary>
    public void Evaluate(IBlackBox box1, IBlackBox box2,
                        out FitnessInfo fitness1, out FitnessInfo fitness2)
    {
        // box1 plays as X.
        NeatPlayer player1 = new NeatPlayer(box1, SquareTypes.X);

        // box2 plays as O.
        NeatPlayer player2 = new NeatPlayer(box2, SquareTypes.O);

        // Play the two boxes against each other.
        var winner = TicTacToeGame.PlayGameToEnd(player1, player2);

        // Score box1
        double score1 = getScore(winner, SquareTypes.X);
        fitness1 = new FitnessInfo(score1, score1);

        // Score box2
        double score2 = getScore(winner, SquareTypes.O);
        fitness2 = new FitnessInfo(score2, score2);

        // Update the evaluation counter.
        _evalCount++;
    }
}

Our ParallelCoevolutionListEvaluator is going to evaluate all permutations, so we will see matchups of genome #1 vs. genome #2 and genome #2 vs. genome #1. Thus, we only have to play 1 game per evaluation, with player 1 as X.

A Custom Experiment

Since we're using our own list evaluator, we will need to create our own INeatExperiment. Most of the code is identical to the code from SimpleNeatExperiment in part 1. The only important difference is in the CreateEvolutionAlgorithm method:

public class TicTacToeCoevolutionExperiment : INeatExperiment
{
    ...

    /// <summary>
    /// Create and return a NeatEvolutionAlgorithm object ready for running
    /// the NEAT algorithm/search. Various sub-parts of the algorithm are
    /// also constructed and connected up.
    /// This overload accepts a pre-built genome2 population and their
    /// associated/parent genome factory.
    /// </summary>
    public NeatEvolutionAlgorithm<NeatGenome> CreateEvolutionAlgorithm(
             IGenomeFactory<NeatGenome> genomeFactory, 
             List<NeatGenome> genomeList)
    {
        // Create distance metric. Mismatched genes have a fixed distance
        // of 10; for matched genes the distance is their weigth difference.
        IDistanceMetric distanceMetric = new ManhattanDistanceMetric(1.0, 0.0, 10.0);
        ISpeciationStrategy<NeatGenome> speciationStrategy = 
                  new ParallelKMeansClusteringStrategy<NeatGenome>(
                      distanceMetric, _parallelOptions);

        // Create complexity regulation strategy.
        IComplexityRegulationStrategy complexityRegulationStrategy = 
                  ExperimentUtils.CreateComplexityRegulationStrategy(
                     _complexityRegulationStr, _complexityThreshold);

        // Create the evolution algorithm.
        NeatEvolutionAlgorithm<NeatGenome> ea = 
                  new NeatEvolutionAlgorithm<NeatGenome>(
                    _eaParams, speciationStrategy, complexityRegulationStrategy);

        // Create genome decoder.
        IGenomeDecoder<NeatGenome, IBlackBox> genomeDecoder = 
                  CreateGenomeDecoder();

        // Create a genome list evaluator. This packages up the genome decoder 
        // with the phenome evaluator.
        IGenomeListEvaluator<NeatGenome> genomeListEvaluator = 
                  new ParallelCoevolutionListEvaluator<NeatGenome, IBlackBox>(
                      genomeDecoder, PhenomeEvaluator);

        // Wrap a hall of fame evaluator around the baseline evaluator.
        genomeListEvaluator = 
                 new ParallelHallOfFameListEvaluator<NeatGenome, IBlackBox>(
                     50, 0.5, ea, genomeListEvaluator, genomeDecoder, PhenomeEvaluator);

        // Initialize the evolution algorithm.
        ea.Initialize(genomeListEvaluator, genomeFactory, genomeList);

        // Finished. Return the evolution algorithm
        return ea;
    }
    ...
}

The first portion of the code sets up the way SharpNEAT performs speciation, complexity regulation, and the basic EvolutionAlgorithm. We aren't toying with those parts (maybe in a future tutorial).

After that, we get to the code that creates our custom list evaluator. If you compare this with the code for SimpleNeatExperiment, you'll see all we had to do here was replace type of evaluator we're creating.

Bonus! A Hall of Fame Evaluator

In the above code you probably noticed we are also creating something called ParallelHallOfFameListEvaluator and passing it our original list evaluator. It's common in coevolution experiments to maintain a "hall of fame" where every few generations we save the best individual. Future individuals then have to compete against not just the current population but also the best individuals from the previous populations. Let's take a look at the first part of how this evaluator is implemented:

/// <summary>
/// Represents a Hall of Fame evaluator.
/// 
/// Champion genomes are periodically stored to create a Hall of Fame. All successive
/// generations are required to evaluate against both their own population and
/// the hall of fame. A genome's overall fitness is a weighted sum of its
/// fitness against the population and the hall of fame.
/// </summary>
public class ParallelHallOfFameListEvaluator<TGenome, TPhenome>
      : IGenomeListEvaluator<TGenome>
{
    readonly IGenomeDecoder<TGenome, TPhenome> _genomeDecoder;
    readonly ICoevolutionPhenomeEvaluator<TPhenome> _phenomeEvaluator;
    readonly ParallelOptions _parallelOptions;
    readonly uint _generationsPerChampion;
    readonly IGenomeListEvaluator<TGenome> _innerEvaluator;
    readonly double _hallOfFameWeight;

    uint _lastUpdate;
    List<TGenome> _hallOfFame;

    ...

    /// <summary>
    /// Construct with the provided number of generations per champion, weight to the hall
    /// of fame fitness, parallel options, and other parameters. 
    /// </summary>
    public ParallelHallOfFameListEvaluator(uint generationsPerChampion,
                              double hallOfFameWeight,
                              AbstractGenerationalAlgorithm<TGenome> ea,
                              IGenomeListEvaluator<TGenome> innerEvaluator,
                              IGenomeDecoder<TGenome, TPhenome> genomeDecoder,
                              ICoevolutionPhenomeEvaluator<TPhenome> phenomeEvaluator,
                              ParallelOptions options)
    {
        Debug.Assert(hallOfFameWeight >= 0d);
        Debug.Assert(hallOfFameWeight <= 1d);

        _generationsPerChampion = generationsPerChampion;
        _hallOfFameWeight = hallOfFameWeight;
        _innerEvaluator = innerEvaluator;
        _genomeDecoder = genomeDecoder;
        _phenomeEvaluator = phenomeEvaluator;
        _parallelOptions = options;

        _hallOfFame = new List<TGenome>();
        ea.UpdateEvent += new EventHandler(ea_UpdateEvent);
    }

    ...

    private void ea_UpdateEvent(object sender, EventArgs e)
    {
        // Make sure that the event sender is an EA.
        Debug.Assert(sender is AbstractGenerationalAlgorithm<TGenome>);

        // Cast the EA so we can access the current champion.
        AbstractGenerationalAlgorithm<TGenome> ea = 
                         (AbstractGenerationalAlgorithm<TGenome>)sender;

        // Update every few generations.
        if (ea.CurrentGeneration < (_generationsPerChampion + _lastUpdate))
            return;

        // Update the update counter.
        _lastUpdate = ea.CurrentGeneration;

        // Add the genome to the hall of fame.
        _hallOfFame.Add(ea.CurrentChampGenome);

        Console.WriteLine("Hall of Fame Updated. Size: {0}", _hallOfFame.Count);
    }

}

The hall of fame evaluator takes a few parameters worth noting:

  • generationsPerChampion - The number of generations that have to elapse before saving another population champion to the hall of fame.
  • hallOfFameWeight - The weight given to the fitness against the hall of fame, as a percentage of the total weight. For example, if hallOfFameWeight is 0.3, then the final fitness for a genome will be 0.3 * hallOfFameFitness + 0.7 * innerEvaluatorFitness.
  • innerEvaluator - The evaluator to use as the other portion of the overall fitness.

In the constructor, the hall of fame evaluator adds itself to the evolution algorithm's update event. The EA will trigger the ea_UpdateEvent method periodically, according to whatever UpdateScheme we have defined. We assume that whatever scheme the user has chosen is what they want, and we work around it in ea_UpdateEvent. In this method, we check if we are due for an update and if we are, we add the current population champion to the hall of fame.

Most of the evaluation code is identical to the coevolution list evaluator:

public class ParallelHallOfFameListEvaluator<TGenome, TPhenome>
      : IGenomeListEvaluator<TGenome>
{
    ...

    /// <summary>
    /// Main genome evaluation loop with no phenome caching (decode on each evaluation).
    /// Individuals are competed pairwise against every champion in the hall of fame.
    /// The final fitness score is the weighted sum of the fitness versus the champions
    /// and the fitness score by the inner evaluator.
    /// </summary>
    public void Evaluate(IList<TGenome> genomeList)
    {
        _innerEvaluator.Evaluate(genomeList);

        // Create a temporary list of fitness values 
        // with the scores of the inner evaluator.
        FitnessInfo[] results = new FitnessInfo[genomeList.Count];
        double innerWeight = 1.0 - _hallOfFameWeight;
        for (int i = 0; i < results.Length; i++)
            results[i] = new FitnessInfo(
                        genomeList[i].EvaluationInfo.Fitness * innerWeight,
                        genomeList[i].EvaluationInfo.AlternativeFitness * innerWeight);

        // Calculate how much each champion game is worth
        double championGameWeight = _hallOfFameWeight / (double)_hallOfFame.Count;

        // Exhaustively compete individuals against each other.
        Parallel.For(0, genomeList.Count, delegate(int i)
        {
            // Decode the first genome.
            TPhenome phenome1 = _genomeDecoder.Decode(genomeList[i]);

            // Check that the first genome is valid.
            if (phenome1 == null)
                return;

            for (int j = 0; j < _hallOfFame.Count; j++)
            {
                // Decode the second genome.
                TPhenome phenome2 = _genomeDecoder.Decode(_hallOfFame[j]);

                // Check that the second genome is valid.
                if (phenome2 == null)
                    continue;

                // Compete the two individuals against each other and get the results.
                FitnessInfo fitness1, fitness2;
                _phenomeEvaluator.Evaluate(phenome1, phenome2,
                                                       out fitness1, out fitness2);

                // Add the results to each genome's overall fitness.
                // Note that we need to use a lock here because
                // the += operation is not atomic.
                lock (results)
                {
                    results[i]._fitness += fitness1._fitness * championGameWeight;
                    results[i]._alternativeFitness += 
                                  fitness1._alternativeFitness * championGameWeight;
                }
            }
        });

        // Update every genome in the population with its new fitness score.
        for (int i = 0; i < results.Length; i++)
        {
            genomeList[i].EvaluationInfo.SetFitness(results[i]._fitness);
            genomeList[i].EvaluationInfo.AlternativeFitness = 
                                                          results[i]._alternativeFitness;
        }
    }

    ...
}

There are a couple differences from the coevolution list evaluator. Rather than setting the overall fitness scores to zero at the start, we set them equal to the weighted value of the inner evaluator's fitness scores. Also, our inner loop is over the hall of fame rather than the population. The rest of code then functions just like the coevolution evaluator, but adds weighted scores for each individual evaluation.

Running the Experiment

Surprisingly, our actual main Program file is almost exactly the same. The only alteration we have to make is that we're creating a different type of experiment (TicTacToeCoevolutionExperiment):

class Program
{
    static NeatEvolutionAlgorithm<NeatGenome> _ea;
    const string CHAMPION_FILE =
                 @"..\..\..\NeatTicTacToe\bin\Debug\coevolution_champion.xml";

    static void Main(string[] args)
    {
        // Initialise log4net (log to console).
        XmlConfigurator.Configure(new FileInfo("log4net.properties"));

        // Experiment classes encapsulate much of the nuts 
       // and bolts of setting up a NEAT search.
        TicTacToeCoevolutionExperiment experiment = 
                                      new TicTacToeCoevolutionExperiment();

        // Load config XML.
        XmlDocument xmlConfig = new XmlDocument();
        xmlConfig.Load("tictactoe.config.xml");
        experiment.Initialize("TicTacToe", xmlConfig.DocumentElement);

        // Create evolution algorithm and attach update event.
        _ea = experiment.CreateEvolutionAlgorithm();
        _ea.UpdateEvent += new EventHandler(ea_UpdateEvent);

        // Start algorithm (it will run on a background thread).
        _ea.StartContinue();

        // Hit return to quit.
        Console.ReadLine();
    }

    static void ea_UpdateEvent(object sender, EventArgs e)
    {
        Console.WriteLine(string.Format("gen={0:N0} bestFitness={1:N6}",
                        _ea.CurrentGeneration, _ea.Statistics._maxFitness));

        // Save the best genome to file
        var doc = NeatGenomeXmlIO.SaveComplete(
                       new List<NeatGenome>() { _ea.CurrentChampGenome }, false);
        doc.Save(CHAMPION_FILE);
    }
}

The program saves the best genome to the GUI's debug\bin directory, so you can launch the game and easily load the champion XML file.

Conclusion

That's it! In this tutorial, we saw how to create a two-player competitive coevolution experiment in SharpNEAT. We created an experiment to evolve a Tic-Tac-Toe AI by playing individuals in the population them against each other. By now you know how to setup a coevolution experiment, create custom population evaluators, and handle update events

If you played against our AI, you probably noticed it (still) sucks. That's (still) okay! The coevolution player is definitely at least as good as the player from part 1, and we didn't have to hand-code any opponent strategies, so I'm calling it a success. The next part of this tutorial will introduce HyperNEAT, a powerful extension to the NEAT algorithm that enables us to leverage the inherent geometry and symmetry of the game board. The result will (hopefully) be a stronger AI opponent.

Source code for part 2 of the tutorial is available here.

Update July 25, 2010: I implemented a second approach to competitive coevolution. This approach, called Host-Parasite coevolution, has two populations evolving simultaneously. Fitness scoring is then a function of performance against the other population (called the parasite population) and a hall of fame of previous generational champions. The project doesn't cover any new features of SharpNEAT, so I'm not going to make a full write up on it. If you want to read more about the approach see this paper. Updated source code is available here.

Footnotes
  1. For an excellent book on evolving neural networks to play checkers, check out Blondie24. Full disclosure: it was written by my current boss, but he's only my boss because I read it and it inspired me to go into AI. 😀
  2. For simplicity, we're going to assume that "coevolution" means two-player competitive coevolution. In general, coevolution is not limited to two players and you can even have cooperative coevolution. I just really didn't want to create a bunch of classes that start with TwoPlayerCompetitiveCoevolution.
  3. If you're not familiar with the Parallel Task Library, don't worry. This code is equivalent to just two nested foreach statements.
  4. As I noted in part 1, remember that fitness scores in NEAT always have to be positive.
Comments (9) Trackbacks (0)
  1. Way to go Wesley! Awesome article! Haven’t had time to read the blondie book yet but I’ll have to check it out. I noticed that its available on Amazon!

    Compliments on the very clear coding style.

  2. Thanks Chris. Glad you found it helpful. :)

  3. Great tutorial, very helpful. I’m a bit confused as to why the evolved player is so bad for such a simple game considering that Blondie does so much better at a much more complicated game on a static topology. Any explanations?

  4. @Darryl, good question. I’m still investigating the issue. It’s possible that the coevolution is getting stuck in a local optima (e.g., something similar to “always defect” in the iterated prisoner’s dilemma). The next article in this series will include a more in-depth analysis of the approach, including an experiment that evolves against an exhaustive evaluator that explores the entire game tree and scores based on the worst possible outcome. I’m hoping that will help shed some light on the problem.

  5. Thanks, Wesley, for the excellent two-part tutorial. I’ve been interested in implementing a HyperNEAT experiment for sometime but the examples included in the libraries were not sufficiently documented. These tutorials are perfect for quickly bringing one up to speed. As my goal lies in HyperNEAT not just NEAT, I am curious about when you will adding the third tutorial incorporating the substrate layer. Again, thanks for sharing your work and advice.

  6. @Scott, glad they’ve helped you out. :)

    I got really sidetracked with part 3, because I expected the HyperNEAT coevolution to reach a Nash equilibrium. It turns out that it didn’t really improve the model at all, for reasons I’m still not entirely sure about. As it’s been months now and I still haven’t gotten around to figuring that out, I’ll go ahead and just release a very brief part 3 which covers HyperNEAT.

    Look for that later today. Thanks for reminding me.

  7. Thanks for posting this, Wesley! I was using SharpNEAT v1 to do coevolution. These articles made the transition a lot easier.

  8. Many Many thanks.

  9. People have been using the above username (WangChung007) to create fake accounts all over the internet. This is the only place I’ve used that username. They’ve been doing this with other usernames as well, posting disgusting things, as far back as 2009. The culprits have been reported to the local Sheriff’s office and the FBI. If you were directed here by somebody, I suggest you report that person to law enforcement.


Leave a comment

 

Trackbacks are disabled.