A New Kind of Science: The NKS Forum > Applied NKS > a query on Genetic Algorithm problem formulation
Author
Jyotirmoy Dalal

Registered: Nov 2005
Posts: 1

a query on Genetic Algorithm problem formulation

The problem is to locate p- emergency service centres for a set of n- villages. Each village is represented by a point, whose location is given in terms of (x, y) coordinates. Maximum distance from each village to its nearest centre is 2 km. We should try to cover all villages by those service centres.

Let

No of villages: n = 50. They are denoted by numerals 1,2,3,…, 50.

No of emergency service centres: p = 8

So, we have to choose 8 villages out of 50 where emergency service centres will be built.

Hence, possible combination of choosing 8 out of 50 = 50C8, which is a huge number.

We are trying to take an initial population of chromosomes, each of which has 8-positions, with numbers 1-50 and no repetition (randomly generated). Like:

1 5 9 18 20 8 49 41

2 4 9 32 45 5 8 27 etc.

Then, taking the villages 1,5,9,18,20 etc. (for 1st string) as the desired locations, we are determining how many villages are covered by circles with 2km radius and centre at the village positions. As we want to cover as many villages as possible, our objective function is to maximize the number of villages covered. Hence, fitness function is expressed by the same quantity.

Now, using GA operators: reproduction, crossover and mutation we wish to find the fitness function value for several generations.

At this point, my queries are:

1. Are the fitness function and problem formulation proper or any modification required?

2. What value should we take for probabilities of reproduction, crossover and mutation?

3. What should be the parent generation size, i.e., how many chromosomes to begin with?

4. Do you suggest any addition/alteration ?

Looking forward to hearing from you soon.

__________________
Jyotirmoy

Report this post to a moderator | IP: Logged

11-02-2005 09:29 AM
Jason Cawley
Wolfram Science Group
Phoenix, AZ USA

Registered: Aug 2003
Posts: 712

It is not clear you have to use a GA for a problem of that size. Depends on how many of them you have to do, I suppose. I was able to write a short Mathematica program in an hour or two that does the 50 choose 5 case in less than 4 minutes. From the data points I have, it would take about 16 hours on my Athlon laptop to plow through a single 50 choose 8 case. This was with completely general code, accepting any parameter values, just given a random data set to test. I got just under 10000 cases per CPU second. With the possibility of using a more serious machine if one has access to one, that is getting managable, if you want an exact solution to a particular problem.

Eventually, to be sure, the problem will get too big to employ exhaustive search. But 50 choose 8 is half a billion cases, and we regularly employ exhaustive search methods in NKS for problems of that size, or a few orders of magnitude more. If you have to do this thousand of times that would be another story.

What did I do to get a 50C5 to run in less than 4 minutes? First of all, I precompute the distances and convert to 1s or 0s for each relation, in range or not. That gives me 50 lists of 50 bits each. I then convert each of those into a single integer, using FromDigits. I can then use DigitCount of BitOr of a list of 8 such numbers to get the fitness score. (A little trick there is to pad the left of each with an extra 1, to keep leading 0s, if you want to count 0s rather than 1s). One can enumerate the 50C8 choices with the Combinatorica function UnrankKSubset. Keep only the best score and its index number. I added a little routine to display the points and cover circles, along with the best result's score and position list.

There are probably ways to optimize that further on the running time side, but using lots of built in Mathematica functions made it pretty fast to run and straightforward to program. (The code is less than a page, for example). The point is, don't recalculate distances, don't use long list operations, don't use real numbers anywhere, have everything that can be computed once only get computed once, use bit optimized code wherever possible etc. One might get additional speed ups by being clever about pruning, e.g. dropping calculations in progress whenever it is clear halfway that the current best can't be beaten by this candidate. Though one has to avoid extra conditionals slowing you down there.

One thing I noticed looking at the problem, is there are elements of it that a GA should be reasonably good at. Good partial solutions in the form of circles that cover many points are likely to have high fitness in the partial solutions, and to contribute aka persist in the complete solution.

But I also noticed another aspect of it that might be exploited by an iterative improvement scheme with a little more direction than a GA - or as bias within a GA. There are some circles even in high fitness score cases that contribute little to the overall score. Inspecting the diagrams visually, you can often see an obvious improvement - this circle only covers 1-2 here but could cover 4 over there, without giving up anything else.

So what you'd look for is the least useful subcase within each genome in your survivor list, and "mutate" on that subcase. You have a choose 8 solution with fitness 37, say. You can ask, what happens to the fitness score if I leave out each of the 8, one at a time? The one that drops the least is the one to change. If several drop by the same amount, just pick one. Say removing location 5 gives a score of 35 to the remaining 7 sites, and this is the worst. Then "mutate" position 5. This would be a sort of directed iterative improvement, without crossover.

You might also look at going still farther in that directed improvement direction, effectively "looking ahead" "one ply" in the space of "moves" of single circles. Pick the site to change as in the previous paragraph. But instead of changing it randomly, calculate the fitness of each of the 42 alternate positions of that 1 circle, keeping the rest of the list (the other 7) fixed, and pick the highest fitness move as the one to make with that 1 circle. This involves significantly more calculation per move, but may well find good answers much more rapidly than random changes, for a problem this structured.

As for "reproduction" if you do go with a classic GA, I might keep copies of the best half of the list, and replace the bottom half of it with copies of the first half mutated in the above manner. If you want to try crossover too, you could keep the top third, add a mutated third as above, and let the other third be crossovers among the top third. You might also inject some complete randomness in a bottom tier, to keep it from getting stuck in local optima. As for population size, I would not think it would need to be huge. Enough to get some variation within each of the above subclasses - 100-150 might do (giving 2^5 cases in each of 3-4 classes). I wouldn't expect increasing population beyond 1000 would help. (That is enough for 2^8 cases in each of 4 classes at each generation). You'd probably just lose on processing time if you pushed that higher.

I hope this helps.

Report this post to a moderator | IP: Logged

11-02-2005 05:32 PM
Jason Cawley
Wolfram Science Group
Phoenix, AZ USA

Registered: Aug 2003
Posts: 712

My colleague Carl Woll at Wolfram Research points out that if absolutely the best solution is not required, just a good one fast, the built in Mathematica function NMaximize can be used to get good results quite rapidly, for problems this size. He wrote a routine - half a page with one short helper function - that achieved a 47 out of 50 covering with a 50 point, 8 center, radius .2 of the data, problem, using 200 iterations of NMaximize, in 8.7 seconds. Given 1000 iterations it eeked out one more point, taking 42 seconds to get 48 out of 50. I'd say we are well below the threshold of needing things like GAs for problems of this size.

That routine works well on problems where the points can readily be covered and the number of circles to place is relatively small. When the number of circles increases and their radius decreases, it misses easy improvements, in the form of circles placed to include only a single point, when it could cover several elsewhere. Hence my suggestion of the directed improvement scheme still seems promising for still larger problems.

Exhaustive search for perfect solutions to small problems when computing time isn't an issue, Carl's NMaximize approach for readily covered problems, and a simple directed improvement scheme (a small sample each looking ahead one ply to see where to move its least useful circle, stopping when no move improves its fitness, then keeping the best of the sample run that way) for the largest problems with small circles, would seem to be the right battery of methods for this general class of problems. And a lot more of them are tractable, using that battery, than you might expect just from the size of its theoretical possibility space.

Report this post to a moderator | IP: Logged

11-03-2005 05:40 PM

wolframscience.com  |  wolfram atlas  |  NKS online  |  web resources  |  contact us