Tuesday, 11 March 2014

Beyond entropy

Level 5 of intelligence seems to be reflecting the actual definition of entropy on the original paper, so before going any further, we will write it in pseudo-code and embed on it the example of the kart seen in video 1 entry:

1) For each option I can take (going left "value = -5" or going rigth "value = +5")
1.2) Imagine you take this option and simulate so you calculate the ending point after 0.1s.
1.3) From this point on, imagine 100 futures of 5 seconds by iterating:
1.3.1) Take a random decision (value = -5 + random*(-5+5))
1.3.2) Simulate so you get your new position after 0.1s more.
1.3.3) Accumulate the small raced distance so you know the "raced" distance at the end.
1.3.4) Stop when you have simulated 5 seconds in the future.
1.4) Now round the final points (x,y) of all 100 futures to a precision of 5: x=5*round(x/5)
1.5) Discard futures with the same end points so you end with only different futures.
1.6) Score this option with Score=Sum(raced ^2) on all different futures found
2) Normalize the option's scores by:
2.1) Calculate AllScores = the sum. of the scores of all the options
2.2) Divide all option's scores by AllScores.
3) Intelligent decision = Sum. for all options( value * score )

The algortihm looks pretty simple, and apply the most accurate entropy I could get, deciding on the averaged options as in the paper... and it is so general and compact one would think nothing more can be done to make it better.

But there are still some grey parts on the algorthm, for instance: Why, once you have all options scored, you do use a weightened average on them to get the final decision (pseudo-code point 3)?

It is a simple way to do it, and the paper uses it... but in some ocassions, it is some how far from being optimal!

Imagine you are the kart, and you have an Y-shaped bifurcantion in front of you. Your options are -15 (left), -5 (little left), +5 (little right) and +15 (right).

If you go left of right, you drive into one of the two bifurcation's arms, so you get a nice 0.4 score on those two options. May be you found 5 different futures on each of those two options, and each future had a length of 4, so score = 5*4^2 = 5*16 = 80.

But if you do a little turn, left or right, you crash into the corner and so your score is lower: you find less different futures (let say 5) and they are shorter (let say 2 meters), so score = 5 * 2^2 = 20.

So you have scores of (80, 20, 20 and 80) for options (-15, -5, +5, +15), if you normalize scores dividing by 200, you get pairs (option.value, option.score) like (-15, 0.4)(-5, 0.1)(+5, 0.1)(+15, 0.4).

If we plot these four points, and use a bicubic spline to interpolate scores for other options not considered (not -15, -5, 5 nor 15), this is what you get:

And here you can see the problem: the averaged value or intelligent decision is: Averaged_Decision = -15*0.4-5*0.1+5*0.1+15*0.4=0, yes, ZERO, the worst possible one!

In the grah. you can see that 0, the averaged decision, have an interpolated score of about 0.07, while
-15 or +15 have interpolated scores of 0.4, then, why do we choose 0?

In a real kart, it means that, as it approach an Y-shaped bifurcation, as turning all left or right is equally good for it... it does nothing and continue right into the corner in front. Usually, on getting too close to the corner, some small difference between left and right will make the kart final decide to go left, but too late to make a smooth drive: it needs to brake, then turn left. Not optimal.

I tried using the max. value to decide, but it was way too agressive to be usuable, then tried miximng both averaged and maximum values 50%-50%, or 80%-20%... eventually it started to work out somehow, but I needed a way to only activate this "mix it with the maximum decision value" only on Y-shaped situations, as in the other cases using the averaged decision was smoother and better.

By comparing the averaged decision's score with the maximum score on the graph I managed to make a smart averaging that I now call "AI Level 6" (don't worry, I stop on level 6!).

So now you have the averaged decision value (AvgValue = 0 in this example) and its score (AvgScore = 0.07 in the example) and the maximum decision value (MaxValue = -15 or +15, let chose -15 for the example) and its score (MaxScore = 0.4 in the example), so you make the level 6 AI "refined decision" by miximg both values:

AvgCoef:= Min(1, 1 - ((MaxScore-3*AvgScore)/Max(0.001, 3*AvgScore)));
Level 6 decision = MaxValue * (1-AvgCoef) + AvgValue * AvgCoef

Let see how it behaves, in the next video, yellow kart use level 5, whiel white one uses level 6. Notice white one is more "nervous" than usual:

This formula is not THE formula, it is just one that works not so bad and that I choosed to apply for Level 6 AI. It is not always better that the level 5 one, on smooth circuits with slow karts, using just entropy with "AI level 5" is marginally better, but when things get tought, karts has more power they can use, or track is not so smooth, then having level 6 on makes the kart to react to hard situations somehow earlier than standard.

So I recommend you stick to "AI level 5", and just keep in mind that, in some cases, it may be better to help the AI take fast decision by adding some spices like this level 6 I have tried here.


  1. Thanks for taking the time to write the pseudo code out. One question. 

    1.3.1) Take a random decision (value = -5 + random*(-5+5))
    Is the -5 applied to each of the 100 iterations? Or is it just there for the initial position?

    1. You need a random drive, so in every frame, before moving the kart "in your imagination", you need to redefine each of the free params values to sometihng random, so you need to repeat it for every little step you make while constructing the future.

      You then have 3 kind of "taking decision" cases:

      1) First frame on a future.

      Fisrst choose one of the free params of the system, then take one of the available option for this free param.

      In our example, you decide to decide on free param "turning left/right" (instead of accelerating/braking), and then choose between -15, -5, +5 or +15 turning options.

      2) Each subsequente step on this future.

      All free params take a random value between the two "smaller" options, -5 and +5.

      3) And back to reality, the kart takes the intelligent decision.

      Here is when you score each option of each free param by summing scores of all its different futures, and then average so you get the "intelligent" decision for this free param. Repeat on all free params and you have a vector that represent the final intelligent decision, that would change one or all the free params.

  2. Ok. It seems to be what I expected. Just to make sure...
    Step 1: set your initial location = your present location minus 5 (if you are doing the -5 option)
    Step 2: Simulate a random move between two extremes (e.g. -5 and 5) from the location in step 1.
    Step 3: Update you current position = previous simulated position + random move
    Step 4: Repeat starting at Step 2 until you have reached time goal (ex. 10s)

    1. Not exactly.

      Step 1: You have an intial position, then you decide to turn the driving wheel -5 degress, simulate a delta time, and you get into another position.

      Step 2: Take a random decision on turning the driving wheel between -5 and +5, simulate another delta time, and get a new position.

      Step 3: Repeat step 2 until delta times simulated reach 10 seconds.

      So you always decide on "pushing the joystick left or right", or "turn the wheels alittle more to the left or right", not on any real change of position of the kart.

      In the case of accelerate/brake you give or take a little of pressure on the pedals, and the simulation adjust it all.

      Once decided your "push", then you simulate: Sum the decided push of -5 -or random value- to the angle of the wheels, them simulate the physics for a delta time so the kart do move to a new position.

  3. Two other questions:

    1. Assuming you only change the heading during a random simulation and assuming there were no boundaries (i.e. open space, no walls) wouldn't the raced distance always be equal and thus the probability also be the same?

    2. Under what conditions does a simulated future terminate prematurely? I.e. How do we get shorter paths? running into a wall? others?

    1. 1) No, you can drive in small circles or just straight. First option raced small distances compared to second.

      2) While simulation kart moves, I check for the final position is inside or outside the track. If it get outside, I term,inate this future in this point, no more decision or simulations, so the reced distance, in my simulation, is "untill time ends of you get out of the track".

      Also, when a kart crashses with another in a future, both karts stop the futures. So, basically, any "disruptive" event can be used to set the kart to "dead" and it means all activity stops.

      If it happends while imagining a future, the future ends there. In "real life" (not imagining a future but really simulating the kart next move) those events can have an effect of decreasing health or similar (not implemented, but will be).

    2. On the raced distance I seem to still be missing something.

      If we assume the speed remains constant and we use a fixed interval for each iteration:

      let speed = 1m / sec
      let time = 0.1 sec (the length of a single iteration in the simulation)
      Distance travelled = speed * time = 0.1 m
      Num iterations = 5.0 sec / time = 50 iterations

      So you will iterate 50 times and travel 0.1m during each iteration. Which means you will travel 5m (0.1m X 50 iterations). Thus the path length is always 5m, no?

      Perhaps, I have misunderstood something?

    3. When you turn the kart you are loosing some velocity in order to give it to rotation, you loose something in the friction needed to change the course of the kart, so you end up having rotated more, but at the expense of running shorter.

      Just think at running in the higway, you are 140 km/h, then you turn the wheel totally left... what happends? sure you will not be still at 140 km/h in one minute from your decision!

    4. I see. My mistake was to assume the speed is constant. It is not because you are taking these force into account.

  4. I've been going through you code. Would save me some time to know where I should look for these calculations?

    1. Hi Andy, I didn't see your comment before! The piece of code you are looking for is is UListOfFutures.pas, look for "spline" for instance as level 7 uses it and it is commented as "spline" in the code.

      But this level 7 is now deprecated, and a little too complicated to follow as spline formulaes are a little complex for the task.

      Instead I use now a "level 8" algortihm (you will find the code below the code for level 7 on the same .pas file) that uses a "sensitivity" of joystic, so the lower it is, the smaller "options" are checked: instead of using "rotate +/-5º" it uses "rotate +/-3º" if it detect that smaller options are getting higher scoring that "big" ones (I always try two sets of options for each free param, being the smaller 1/3 of the biggest).

      It works out much better than level 7 and is much simplier to program and to understand (you can visualize it by activating in the enus "Debug, Show levels". The 3 levels you will see correspond to Energy, Health and the third is the sensitivity of the main power free param.