Parallelizing AI testing for the game of Strategy

MIT 6.338 - Parallel Scientific Computing

Nathan Warshauer - wars@mit.edu

Background

The game of Strategy is a simple one, designed to test not the design of the game engine itself but the deeper game strategies that drive the game's movement and location control AI. Using a simple explosive ranged projectile as the main attack form, units are tracked in a 3-d space and receive only information as to the area they will attack. By equalizing all units with similar sizes and the same attack form, we are able to see the effects of a losing, versus winning battle quite easily in terms of the game space. We use this as a measurement device used by the units to gauge if they will engage the attack space before they begin battling. Additionally, units desire to pick up the remains of their destroyed opponents as these remains constitute resources in the game. To win, all opponents' units must be destroyed and their remains captured by your own units.

A blackboard strategy is used instead of a hierarchical rules-based strategy in order to allow for adaption of the AI to current gaming environments. Current rules-based systems do allow for success under this game's restricted environment, but it is of more interest to construct a system that can adapt to the opponent's strategies and attack patterns effectively. This allows us to test and tune an AI that does not "cheat" and therefore will be applicable to more modern RTS games.

Parallelization

To parallelize the game of Strategy, a different search algorithm and set of weights are given to each player. Restriction of the game space to two dimensions makes the game useless, as there is not enough space for a proper test of the game's AI. By allowing three dimensions the game AI is confronted with three degrees of movement, ensuring that a random search pattern will not win over an intellegent one.

A distributed memory version is used because of the close comparisons needed for the coefficient tuning. By running many jobs at a time, we need only one run to verify that our "winning" weights are indeed a better set then the losing set. A result of 6-2 on 8 processors would indicate the side that won 6 times has a better AI. We can look at any number of game summaries simultaneously by running the jobs on multiple processors, and test different weights against each other as necessary.

Game AI

The "winning strategy" for the game of Strategy will be that which destroys the opponent's units and gathers them before the opponent's AI has time to react to the attack. Thus, it is necessary to ensure that attacks satisfy one of two requirements:

1) Destroy ALL opponent's units in the attacking area.
2) Destroy units and gather more resources from the attack then when it was started. Essentially, come out ahead.

To satisfy these two requirements, it is necessary to always give the AI two scenarios for the attack. One scenario weighs the number of opponent's units against the AI's number and attempts to save at least one of the AI's units in order to gather the resources from the battle. This is requirement (1), and its weight is judged against a second scenario(2) where the opponent is also gathering resources while attacking. The second scenario considers the units in terms of total resources each side will put into the battle, versus total attack force each side theoretically is using.

The AI is localized by forcing units to only consider nearby units and resources in its decision. Games such as Starcraft and Warcraft utilize AI such as this to control units during battle, but do not currently incorporate comparisons with other units in their AI. Once the computer picks an attack point, if it is fighting a losing battle it will continue to attack with the exact same attack force, at the exact same point. By giving units scalable AI's, larger attack forces will gather before beginning an assault on heavily guarded ground, thus solving this problem.

Additionally, the explosive nature of the weapons used in the game ensure that large groups of troops are equally as efficient as small groups of troops. Therefore, it is important that the AI knows exactly when to attack as soon as there is an advantage, as deviation from this is the same as losing troops. By gathering large groups of troops in useless attack areas it is incentive to opponents to attack that area, so defensive areas are also utilized. These areas are quite difficult in terms of unit interaction and communication, however, so we limit the scope of this project to only offensive attacks.

This means that the offensive attack points will be the only ones controlling their own movement and gives them the majority of the control over other controlled points. Each player has a number of attack, defend, and gather points that control the actual units in the game. By letting the units only reference these points during their decision process a more accurate model of the game AI is produced, eliminating the current model that resembles French and British soldiers marching into combat(slaughter) in lines. By forcing injured or less offensively oriented AI to retreat defense and offense can be weighted against each other to produce a game with the interplay of an actual battlefield. This gives different points on the battlefield weights not according to surrounding terran and unit density, but actual relation to other player's controlled points. Effectively we see the straggler who is slower then the other units, the unit that wanders and is surrounded by enemy units, and most importantly a small force that will gather around an attack point and expand in size very quickly to amass enough force to take the point away from another player.

With three attack points, three defense points, and three gather points to control, the update sequence is very complex and much more lengthy then current sequences. Therefore, it is important to make the update sequence parallel in order to expedite the testing process. As one game can take between ten and twenty minutes serially, by parallelizing the game more then two players can be considered, and the additional player strength weight can be added. However, the possibility of a non-discrite testcase arises with this possibility so it is explored experimentally only. This arises from the fact that a unit can run forever and will eventually be caught by one player, but can possibly escape and be weighted too low to actually be eliminated with multiple opponents until it takes enough resources to again be a threat. As the basic consideration is between different AI, the multi-player weights are relatively unimportant in terms of actually finishing off the "last unit".

Defensive AI used is a simple bait trap, stepping out of the area defended until a unit enters the defended area. By placing a unit in the defended area as well as defending units, the unit is lost but all defenders immediately converge upon the attacking force. This strategy forces us to consider the second requirement(2) more heavily then the first, and therefore shows the need for careful testing of all weights used. By parallelizing the testing process, any AI can be efficiently tuned and played against any other AI.

Gathering AI used is a simple seek-and-absorb after an initial pathfinding algorithm. Although it attempts to traverse ground controlled by defensive points, the path is usually straight and once a gather point has been reached a unit will remain for a fixed amount of time before leaving. This ensures that gather points never take precedence over attack and defend points, and that units generated inside of gather areas do not leave the gather area around the point for a fixed amount of time as well. This makes it highly desirable to place attack and defend points around the opponent's gather areas, but difficult to find those areas as they can move as quickly as the units. Detection of these areas is key to winning the strategic game, and proper control is similarly as important.

Game Engine

To model the game effectively without polygons, as this is the only possible way to allow the update sequences to finish in a reasonable amount of time, floating point numbers are used for the x, y, z coordinates of units. Initially each player is given a number of units in a line with zero velocity. The objective of the game is to effectively gather enough units such that the opponent cannot withstand an attack. As there is no unit generation aside from gathering, the main update sequence is to update the attack, defense, and gather points and then update all of the units according to their current status and the weights they give to each of the points. Injured units prefer to gather and defend, while healthy units will go ahead and attack.

The game tracks weights associated with attack, defense, and gather points in order to allow the units to communicate more effectively. While units are gathering around an attack point they can increase the weight of the attack point in order to call for reinforcements more quickly. Similarly, an offensive or defensive point can call a gather point to clean up the resources from a battle. The rates of change are all adjustable weights, allowing for fine tuning of an AI so that it effectively manages its resources.

Each gather point has a minimum of one unit and will not release the final unit to the game, ensuring that gather points operate effectively. When a gather point loses the last unit to battle, it immediately calls for another. However, this is preceded by fixing the gather point and calling an attack point directly above the gather point with maximum weighting. When the gathering unit reaches the location and the gather point again has the minimum one unit, the attack point is released and the gather point begins moving again.

Impulse gravity around important events is the method of unit response, functioning as a method for pulling points towards battles and resources the same way that the points functionally pull the units. This is the method of adaption, and its coefficient is vital for the game environment. As a function of the expected amount of troops each player has, as well as the amount of resources available, gravity also introduces error by taking the system out of its discrete bounds. Therefore, it is important to limit the size of the system in terms of units and area. The useful effect of the developed AI is preliminarily in enclosed systems, as testing is far easier in this environment. Expansion to open systems requires tighter unit control then that developed here. Specifically, the system would need to be recursive at some point, which it is not. For example, it would need to know when to add new attack, defense, and gather points, which it doesn't.

Additionally, we randomize generated hit points to between 5 and 8 in order to provide a better testing environment, and have given units decision capabilities after enemy interactions as well as after a mode timer reaches zero. The decision to flee only occurs during an attack, as defense and gathering modes are not offensive attacks and therefore cannot react in such a manner. There are a total of 5 different status's, and only two consider enemy units during their decision process. This vulnerability gives more adaptive AI a huge advantage, as fast unit response is key. We do not consider variable reaction times to events, but allow for more sensitive weighting with smaller convergence coefficients to compensate for this. In effect, a slower convergence from a smaller rate of change makes units react more quickly to events and therefore win over larger rates of change. However, by making the rate of change too small the response time is similarly slowed as weights become too large. Thus, carefully tuning the weights and rates is extremely important.

Sample Code

Download source code of a simple implementation of the Strategy game.

Results

The result of this extensive testing is that coefficient tuning is fast and efficient, and therefore other developments such as gravity between attack points and different search patterns can be explored. The coefficients that are tuned are rate of change for point weights, gravity strength rate of change, and attack, defense, and gather groups' ideal sizes. By giving different aspects of the game weights in a functional decision process locally controlled by the units themselves, we must tune all rates of change before an ideal AI is produced. Thus, it is extremely advantageous to use parallelizable code to speed up the process, and distributed memory systems(MPI) solve this problem for us.

A sample output from a test of the Strategy AI with equal weights is shown below, run on 4 processors simultaneously. This output shows an unfinished game resulting from the maximum step cap. By using multiple processors in parallel, we are able to get consistent data showing different unique scenarios, but the same resulting tie. Therefore, we accept the two AI weightings to be relatively equivalent. The only difference was between one of the rates of convergence back to the set weights, and no difference in default weights was made. Thus, for this particular case we assume the rates of convergence were both too high for the two AI to show any noticeable differences in performance, and test a lower rate of convergence.

Trial 2 ran for 3000 steps
7 Units for Player 1 vs 7 Units for Player 2
0 2 1 vs 0 2 0 Attacking
0 2 0 vs 1 0 1 Defending
1 0 1 vs 1 1 1 Gathering
Trial 3 ran for 3000 steps
7 Units for Player 1 vs 7 Units for Player 2
0 2 1 vs 0 2 0 Attacking
0 2 0 vs 1 0 1 Defending
1 0 1 vs 1 1 1 Gathering
Trial 1 ran for 3000 steps
7 Units for Player 1 vs 7 Units for Player 2
0 2 1 vs 0 2 0 Attacking
0 2 0 vs 1 0 1 Defending
1 0 1 vs 1 1 1 Gathering
Trial 0 ran for 3000 steps
7 Units for Player 1 vs 7 Units for Player 2
3 0 0 vs 2 0 2 Attacking
1 0 0 vs 0 1 1 Defending
1 1 1 vs 0 0 1 Gathering

An example of how we have used parallelization to test more effectively, we have each computer try 100 games and then compute the total number of games lost, won, and drawn. The output is shown below.

[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
400 to 347 with 53 draws
[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
414 to 322 with 64 draws
[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
420 to 314 with 66 draws

Unfortunately, coefficient tuning is difficult with such a randomized system as constructed by the game of Strategy. By placing more rules in the game, it is possible to give one side an advantage and therefore to see the results of changing coefficient values. By forcing smaller rates of convergence the game speed increases drastically also. This is possible because the reactions of gravitational impulses will take precedence over the randomized search pattern, and therefore pull the units towards enemy units more quickly. Unfortunately, because a draw is possible and gathering units are very capable of destroying themselves in large groups it is necessary to compensate for the large number of draws by understanding that the number of draws increases as coefficients become closer, peaking at around 12% of the total number of games. Thus, the number of draws is an extremely effective gauge of how close two AI's coefficients are. An example of the same AI playing itself except for a minor difference in gathering weight rate-of-convergence that does not play a part, is shown below:

[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
396 to 317 with 87 draws
[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
382 to 347 with 71 draws
[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
357 to 358 with 85 draws
[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
344 to 362 with 94 draws
[wars@frontend-0 StrategyProject]$ mpirun n1-7 -np 8 ./Strategy
348 to 367 with 85 draws

Therefore, to tune the AI it is necessary to both minimize the number of draws against another AI as well as maximizing the number of wins. Without introducing some sort of rules structure to conduct a higher-order sweep of the playing field it appears the best coefficient model uses all values between 0.02 and 0.001. Between these values the function of impulse gravity works most effectively, whether on attack, defense, or gathering units. Specifically a value of around 0.012 has a very high win rate against untuned coefficients, and an even record against other similar AI.

Showing the significance of parallel testing speeding up the testing process is an example of an AI being tuned with the rates of convergence fixed, but a less aggressive AI used in the first case. The two runs show how testing gets difficult with possible game draws, forcing us to reconsider all possible weight interactions and shows how human tuning can help an AI.

Case 1 : Less aggressive AI with differently weighted coefficients

320 to 376 with 104 draws
349 to 345 with 106 draws
331 to 335 with 134 draws
370 to 329 with 101 draws
342 to 339 with 119 draws
356 to 339 with 105 draws
359 to 342 with 99 draws
392 to 297 with 111 draws
338 to 358 with 104 draws

With a less aggressive AI, the first scenario and the second both appear to play evenly in case 1. This is the boundary we experience when tuning the coefficients.

Case 2 : More aggressive AI with differently weighted coefficients

328 to 396 with 76 draws
335 to 395 with 70 draws
362 to 372 with 66 draws
362 to 363 with 75 draws
356 to 371 with 73 draws
342 to 373 with 85 draws
370 to 346 with 84 draws
354 to 356 with 90 draws
359 to 379 with 62 draws
351 to 382 with 67 draws
344 to 375 with 81 draws
340 to 394 with 66 draws
327 to 420 with 53 draws

With a more aggressive AI, we see the game has begun to resolve itself in favor of the second AI weights. In fact this is true, as it wins 12 of the above 13 games and does have a better set of rates of convergence then the first. We need to test effectively over 50000 games for each tuning, and therefore use parallel computing to solve this problem.

[wars@frontend-0 StrategyProject]$ time mpirun n1-7 -np 8 ./Strategy
35123 to 38361 with 6516 draws

real    7m40.632s
user    0m0.010s
sys     0m0.000s

Problems

Some arrays were indexed at 1 instead of 0, so an extra array element was added to arrays to compensate. Indexing is therefore at 1 for point arrays, and for x,y,z coordinates. Subsequently changed back to indexed at 0, as time constraints allowed for this problem to be corrected. All instances of 1-indexed arrays have been removed.

There are a lot of draws, so we must compensate by fine-tuning the game engine. Impulse gravity was reduced by 50% to 0.2% as well, with the hopes of keeping points separated. Different weights for gathering and other default weights and cutoffs have been tuned to reduce error. However, it is much easier to account for the draws rather then trying to eliminate them.

Given the current game model, because attacking units will first increase the weight of the point they are attacking at, it is far more effective to examine the game from the perspective of number of the number of units remaining rather then examing previous circumstances. Because units have locallized control and there must be 3 or less remaining for the game to end, both sides killing all but a remaining 1 or 2 units constitutes a draw simply because the gathering algorithm has not sustained the attack and defense algorithms. By forcing the gathering point to be more aggressive and search out resources it is possible to get a game that does not end at the 3 unit to 3 unit stage but again escalates back to larger numbers. However, it is better for the testing environment if we restrict ourselves to an unintelligent gathering algorithm. If we place a systematic method of retrieval, we must also give the attack and defense points similar methods of attack which counteracts the gravitational and weight-oriented blackboard AI we have developed. The game will begin to follow rules such as search patterns, which are unimportant and relatively easy to code into a game AI.

Related work

Here are some pages featuring work related to game AI design and testing.

http://www.gameai.com/