Jason Cawley
Wolfram Science Group
Phoenix, AZ USA
Registered: Aug 2003
Posts: 712 |
It is not clear you have to use a GA for a problem of that size. Depends on how many of them you have to do, I suppose. I was able to write a short Mathematica program in an hour or two that does the 50 choose 5 case in less than 4 minutes. From the data points I have, it would take about 16 hours on my Athlon laptop to plow through a single 50 choose 8 case. This was with completely general code, accepting any parameter values, just given a random data set to test. I got just under 10000 cases per CPU second. With the possibility of using a more serious machine if one has access to one, that is getting managable, if you want an exact solution to a particular problem.
Eventually, to be sure, the problem will get too big to employ exhaustive search. But 50 choose 8 is half a billion cases, and we regularly employ exhaustive search methods in NKS for problems of that size, or a few orders of magnitude more. If you have to do this thousand of times that would be another story.
What did I do to get a 50C5 to run in less than 4 minutes? First of all, I precompute the distances and convert to 1s or 0s for each relation, in range or not. That gives me 50 lists of 50 bits each. I then convert each of those into a single integer, using FromDigits. I can then use DigitCount of BitOr of a list of 8 such numbers to get the fitness score. (A little trick there is to pad the left of each with an extra 1, to keep leading 0s, if you want to count 0s rather than 1s). One can enumerate the 50C8 choices with the Combinatorica function UnrankKSubset. Keep only the best score and its index number. I added a little routine to display the points and cover circles, along with the best result's score and position list.
There are probably ways to optimize that further on the running time side, but using lots of built in Mathematica functions made it pretty fast to run and straightforward to program. (The code is less than a page, for example). The point is, don't recalculate distances, don't use long list operations, don't use real numbers anywhere, have everything that can be computed once only get computed once, use bit optimized code wherever possible etc. One might get additional speed ups by being clever about pruning, e.g. dropping calculations in progress whenever it is clear halfway that the current best can't be beaten by this candidate. Though one has to avoid extra conditionals slowing you down there.
One thing I noticed looking at the problem, is there are elements of it that a GA should be reasonably good at. Good partial solutions in the form of circles that cover many points are likely to have high fitness in the partial solutions, and to contribute aka persist in the complete solution.
But I also noticed another aspect of it that might be exploited by an iterative improvement scheme with a little more direction than a GA - or as bias within a GA. There are some circles even in high fitness score cases that contribute little to the overall score. Inspecting the diagrams visually, you can often see an obvious improvement - this circle only covers 1-2 here but could cover 4 over there, without giving up anything else.
So what you'd look for is the least useful subcase within each genome in your survivor list, and "mutate" on that subcase. You have a choose 8 solution with fitness 37, say. You can ask, what happens to the fitness score if I leave out each of the 8, one at a time? The one that drops the least is the one to change. If several drop by the same amount, just pick one. Say removing location 5 gives a score of 35 to the remaining 7 sites, and this is the worst. Then "mutate" position 5. This would be a sort of directed iterative improvement, without crossover.
You might also look at going still farther in that directed improvement direction, effectively "looking ahead" "one ply" in the space of "moves" of single circles. Pick the site to change as in the previous paragraph. But instead of changing it randomly, calculate the fitness of each of the 42 alternate positions of that 1 circle, keeping the rest of the list (the other 7) fixed, and pick the highest fitness move as the one to make with that 1 circle. This involves significantly more calculation per move, but may well find good answers much more rapidly than random changes, for a problem this structured.
As for "reproduction" if you do go with a classic GA, I might keep copies of the best half of the list, and replace the bottom half of it with copies of the first half mutated in the above manner. If you want to try crossover too, you could keep the top third, add a mutated third as above, and let the other third be crossovers among the top third. You might also inject some complete randomness in a bottom tier, to keep it from getting stuck in local optima. As for population size, I would not think it would need to be huge. Enough to get some variation within each of the above subclasses - 100-150 might do (giving 2^5 cases in each of 3-4 classes). I wouldn't expect increasing population beyond 1000 would help. (That is enough for 2^8 cases in each of 4 classes at each generation). You'd probably just lose on processing time if you pushed that higher.
I hope this helps.
Report this post to a moderator | IP: Logged
|