## Evolutionary Algorithms: The Little Things (Part 1)

I'm the kind of person who finds himself reading about a new technology or a cool algorithm, and tries to implement it based on the high-level description. Unfortunately, I don't always guess everything correctly, and sometimes the implementation turns out to not work; or it kind of works, but not as well as expected, which can be even worse.

A key example of this for me was when I read about Evolutionary Algorithms. At the core, it's sounds so ingeniously simple:

- Create a population of individuals
- Score the individuals based on some performance metric
- Kill off the weakest performers
- Create children from the surviving parents
- If not finished, go to #2

That's really it, right? I always thought so.

In 2008, I started working for a pioneer in EAs, and discovered that almost everything I did was slightly wrong. Every time I had to start a new project, I would dive into it head first, burn through the code, compile, run, re-run, get frustrated, and then take that slow walk into his office to admit defeat. After enough examples, illustrations, and metaphors I would finally arrive at the correct implementation. *Gasp! It works!*

When I discovered another knowledge gap last week, I decided that I had better starting writing this stuff down. This series is going to be indefinitely long, with each part dedicated to a different little technique or formula that I never would have guessed. The goal is to provide easy-to-read, understandable code samples and maybe some brief explanations of why things should work this way.

This first article covers two very fundamental, very important pieces that I initially got very wrong: Gaussian Mutation and Self-Adaptation.

## Gaussian Mutation

The elevator pitch for EAs usually has some mention about "random mutation" of child genes. Essentially, each child should be similar to its parent, but there should be a little variation.

Consider an example where each individual in the population is a collection of doubles (i.e., each double is a gene). For instance, you're trying to find the minimum value of some function that takes two parameters: F(x,y). You evaluate your individuals, kill the weakest ones, and now have to perform reproduction.

The intuitive approach to me was to simply change one number to another random number. Hence: random mutation!

//Assume we are mutating a child's gene //The value must be in the range: min <= val <= max geneValue = random.NextDouble() * (max - min) + min;

The code above simply picks a random point in the gene's valid range and sets the child gene to that value.

Sometimes you can find decent solutions with the implementation above. However, mutating genes by picking a uniformly random value like the above, where the parent's gene has no impact on the child's gene, is not going to perform very well. A better approach is to use Gaussian mutation:

geneValue = gaussianMutation(parentGeneValue, sigma); geneValue = clamp(geneValue, min, max); ... private double gaussianMutation(double mean, double stddev) { double x1 = Rand.NextDouble(); double x2 = Rand.NextDouble(); // The method requires sampling from a uniform random of (0,1] // but Random.NextDouble() returns a sample of [0,1). // Thanks to Colin Green for catching this. if(x1 == 0) x1 = 1; if(x2 == 0) x2 = 1; double y1 = Math.Sqrt(-2.0 * Math.Log(x1)) * Math.Cos(2.0 * Math.PI * x2); return y1 * stddev + mean; } private double clamp(double val, double min, double max) { if (val >= max) return max; if (val <= min) return min; return val; }

So what the heck is this code doing? Well, rather than sampling uniformly like I initially did, this approach samples from a Gaussian distribution (A.K.A. normal distribution or bell curve) of possible values, where the mean is the parent gene's value.

You probably remember from your statistics class (or the Wikipedia link) that a Gaussian distribution peaks at the mean and falls off based on its standard deviation, often referred to as its *sigma* for short. By sampling from this distribution, the child is more likely to stay near the parent in the search space, helping to optimize the current optima, but at the same time we also occasionally venture out drastically, helping to escape local optima.

As a standard rule-of-thumb, a good value for sigma is 1/6 of the range of the variable:

sigma = (max - min) / 6d;

However, this tends to make it harder to optimize once you've found a generally global optimum. This is where the next section comes in handy.

## Self-Adaptation

By fixing the sigma of the mutation operator, we're not going to be able to really home-in on optimal values once we're pretty sure we've found the general peak because the mutation now needs to do a more local search. A better approach would be to enable the algorithm to adapt the sigma as the search progresses, so that once it's pretty sure the global optima hill has been found, it can start to refine the search using smaller mutations. This process is known as self-adaptation.

In a self-adaptive evolutionary algorithm, every gene has its own sigma which co-evolves along with it.^{1} Unlike gene mutation, however, the sigma is mutated using an exponential distribution:

sigma = Math.Max(0, sigma * Math.Exp(gaussianMutation(0,1))); geneValue = gaussianMutation(parentGeneValue, sigma); geneValue = clamp(geneValue, min, max);

There are some subtle reasons for this which aren't really worth going into at the moment. Suffice it to say that this distribution provides the right balance of convergence speed and healthy search space coverage.

## A Final Note on Parameter Mutation

A lot of people believe that you should have a parameter mutation probability, where you only mutate a gene some percentage of the time. This can work sometimes, but in general it's unnecessary if you're using the systems above. When in doubt, just set your mutation probability to 1 and only change it if you can make some empirical justification for doing so.

## Conclusion

Dealing with Evolutionary Algorithms is a lot like carrying a bunch of water balloons to the roof during a party. If you're careful and you know what you're doing, you can have a ton of fun; if you're reckless, you'll get all wet and people will laugh at you. So please use caution when implementing EAs in your system and make sure to include the above techniques. You'll almost certainly see improvements in search quality and efficiency-- and hey, I just gave you all the code!

## Footnotes

- Some people asked for a reference for self-adaptive evolutionary algorithms. The one I'm most familiar with is Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. Self-adaptation is discussed on page 159. ↩

nick gogertyAugust 2nd, 2010 - 13:54

hey just curious. I always heard that mutation played a small part in GA and if overly presented could be unstable relative to the sex component of swapping allelles.

Just curious about your findings on that. Also I suppose you could use a parameter to for variable or declining mutation influence which might be similar to simulated annealing.

WesleyAugust 4th, 2010 - 03:02

Hi Nick,

Thanks for the feedback.

I don’t have any exact references off hand about mutations vs. sexual reproduction, but I’m sure a little Google Scholar searching could find them. A couple points though:

1. It’s commonly known that for problems like evolving bit strings, N-point crossover is not enough. The reason being that if none of the initial individuals have a certain bit set properly, then you can swap it around as much as you want but it will never get flipped.

2. The original journal paper on the NEAT algorithm does an ablation study where they remove features and compare the results. They show that while some features like speciation help the algorithm drastically, sexual reproduction only helps speed things up slightly. As the authors note: “Thus, it is clear that mating does contribute when it is done right. However, the nonmating version of NEAT is still significantly faster than the other ablations.” The nonmating version also has a 0% failure rate, unlike the other ablations. See the paper for more info (pg. 20-22): http://nn.cs.utexas.edu/keyword?stanley:ec02

As for self-adaptation, it does seem similar in some respects to simulated annealing or a declining mutation rate. However, this approach lets evolution fan out or collapse the search on its own rather than having to hand tweak parameters like that.

Peter LawreyAugust 9th, 2010 - 14:40

Another way to turn a [0,1) distribution into a (0,1] is to do x = 1 – x;

e.g

double x1 = 1 – Rand.NextDouble();

volatile smileOctober 31st, 2010 - 06:20

If you just want the parent to have an effect why not use a triangle dist instead of a Gaussian? Faster computation time.

Also — a great explanation can be found if you google for \Introducing Robo-Scientist\. Some Cornell guys, a Radiolab podcast, and a Nature article. The Cornell lab explains symbolic regression and in doing so GA.