1) For each option I can take (going left "value = -5" or going rigth "value = +5")
1.2) Imagine you take this option and simulate so you calculate the ending point after 0.1s.
1.3) From this point on, imagine 100 futures of 5 seconds by iterating:
1.3.1) Take a random decision (value = -5 + random*(-5+5))
1.3.2) Simulate so you get your new position after 0.1s more.
1.3.3) Accumulate the small raced distance so you know the "raced" distance at the end.
1.3.4) Stop when you have simulated 5 seconds in the future.
1.4) Now round the final points (x,y) of all 100 futures to a precision of 5: x=5*round(x/5)
1.5) Discard futures with the same end points so you end with only different futures.
1.6) Score this option with Score=Sum(raced ^2) on all different futures found
2) Normalize the option's scores by:
2.1) Calculate AllScores = the sum. of the scores of all the options
2.2) Divide all option's scores by AllScores.
3) Intelligent decision = Sum. for all options( value * score )
The algortihm looks pretty simple, and apply the most accurate entropy I could get, deciding on the averaged options as in the paper... and it is so general and compact one would think nothing more can be done to make it better.
But there are still some grey parts on the algorthm, for instance: Why, once you have all options scored, you do use a weightened average on them to get the final decision (pseudo-code point 3)?
It is a simple way to do it, and the paper uses it... but in some ocassions, it is some how far from being optimal!
Imagine you are the kart, and you have an Y-shaped bifurcantion in front of you. Your options are -15 (left), -5 (little left), +5 (little right) and +15 (right).
If you go left of right, you drive into one of the two bifurcation's arms, so you get a nice 0.4 score on those two options. May be you found 5 different futures on each of those two options, and each future had a length of 4, so score = 5*4^2 = 5*16 = 80.
But if you do a little turn, left or right, you crash into the corner and so your score is lower: you find less different futures (let say 5) and they are shorter (let say 2 meters), so score = 5 * 2^2 = 20.
So you have scores of (80, 20, 20 and 80) for options (-15, -5, +5, +15), if you normalize scores dividing by 200, you get pairs (option.value, option.score) like (-15, 0.4)(-5, 0.1)(+5, 0.1)(+15, 0.4).
If we plot these four points, and use a bicubic spline to interpolate scores for other options not considered (not -15, -5, 5 nor 15), this is what you get:
And here you can see the problem: the averaged value or intelligent decision is: Averaged_Decision = -15*0.4-5*0.1+5*0.1+15*0.4=0, yes, ZERO, the worst possible one!
In the grah. you can see that 0, the averaged decision, have an interpolated score of about 0.07, while
-15 or +15 have interpolated scores of 0.4, then, why do we choose 0?
In a real kart, it means that, as it approach an Y-shaped bifurcation, as turning all left or right is equally good for it... it does nothing and continue right into the corner in front. Usually, on getting too close to the corner, some small difference between left and right will make the kart final decide to go left, but too late to make a smooth drive: it needs to brake, then turn left. Not optimal.
I tried using the max. value to decide, but it was way too agressive to be usuable, then tried miximng both averaged and maximum values 50%-50%, or 80%-20%... eventually it started to work out somehow, but I needed a way to only activate this "mix it with the maximum decision value" only on Y-shaped situations, as in the other cases using the averaged decision was smoother and better.
By comparing the averaged decision's score with the maximum score on the graph I managed to make a smart averaging that I now call "AI Level 6" (don't worry, I stop on level 6!).
So now you have the averaged decision value (AvgValue = 0 in this example) and its score (AvgScore = 0.07 in the example) and the maximum decision value (MaxValue = -15 or +15, let chose -15 for the example) and its score (MaxScore = 0.4 in the example), so you make the level 6 AI "refined decision" by miximg both values:
AvgCoef:= Min(1, 1 - ((MaxScore-3*AvgScore)/Max(0.001, 3*AvgScore)));
Level 6 decision = MaxValue * (1-AvgCoef) + AvgValue * AvgCoef
So I recommend you stick to "AI level 5", and just keep in mind that, in some cases, it may be better to help the AI take fast decision by adding some spices like this level 6 I have tried here.