Entropic and Fractal Intelligence: Beyond entropy

Tuesday, 11 March 2014

Beyond entropy

Level 5 of intelligence seems to be reflecting the actual definition of entropy on the original paper, so before going any further, we will write it in pseudo-code and embed on it the example of the kart seen in video 1 entry:

1) For each option I can take (going left "value = -5" or going rigth "value = +5")
1.2) Imagine you take this option and simulate so you calculate the ending point after 0.1s.
1.3) From this point on, imagine 100 futures of 5 seconds by iterating:
1.3.1) Take a random decision (value = -5 + random*(-5+5))
1.3.2) Simulate so you get your new position after 0.1s more.
1.3.3) Accumulate the small raced distance so you know the "raced" distance at the end.
1.3.4) Stop when you have simulated 5 seconds in the future.
1.4) Now round the final points (x,y) of all 100 futures to a precision of 5: x=5*round(x/5)
1.5) Discard futures with the same end points so you end with only different futures.
1.6) Score this option with Score=Sum(raced ^2) on all different futures found
2) Normalize the option's scores by:
2.1) Calculate AllScores = the sum. of the scores of all the options
2.2) Divide all option's scores by AllScores.
3) Intelligent decision = Sum. for all options( value * score )

The algortihm looks pretty simple, and apply the most accurate entropy I could get, deciding on the averaged options as in the paper... and it is so general and compact one would think nothing more can be done to make it better.

But there are still some grey parts on the algorthm, for instance: Why, once you have all options scored, you do use a weightened average on them to get the final decision (pseudo-code point 3)?

It is a simple way to do it, and the paper uses it... but in some ocassions, it is some how far from being optimal!

Imagine you are the kart, and you have an Y-shaped bifurcantion in front of you. Your options are -15 (left), -5 (little left), +5 (little right) and +15 (right).

If you go left of right, you drive into one of the two bifurcation's arms, so you get a nice 0.4 score on those two options. May be you found 5 different futures on each of those two options, and each future had a length of 4, so score = 5*4^2 = 5*16 = 80.

But if you do a little turn, left or right, you crash into the corner and so your score is lower: you find less different futures (let say 5) and they are shorter (let say 2 meters), so score = 5 * 2^2 = 20.

So you have scores of (80, 20, 20 and 80) for options (-15, -5, +5, +15), if you normalize scores dividing by 200, you get pairs (option.value, option.score) like (-15, 0.4)(-5, 0.1)(+5, 0.1)(+15, 0.4).

If we plot these four points, and use a bicubic spline to interpolate scores for other options not considered (not -15, -5, 5 nor 15), this is what you get:

And here you can see the problem: the averaged value or intelligent decision is: Averaged_Decision = -15*0.4-5*0.1+5*0.1+15*0.4=0, yes, ZERO, the worst possible one!

In the grah. you can see that 0, the averaged decision, have an interpolated score of about 0.07, while
-15 or +15 have interpolated scores of 0.4, then, why do we choose 0?

In a real kart, it means that, as it approach an Y-shaped bifurcation, as turning all left or right is equally good for it... it does nothing and continue right into the corner in front. Usually, on getting too close to the corner, some small difference between left and right will make the kart final decide to go left, but too late to make a smooth drive: it needs to brake, then turn left. Not optimal.

I tried using the max. value to decide, but it was way too agressive to be usuable, then tried miximng both averaged and maximum values 50%-50%, or 80%-20%... eventually it started to work out somehow, but I needed a way to only activate this "mix it with the maximum decision value" only on Y-shaped situations, as in the other cases using the averaged decision was smoother and better.

By comparing the averaged decision's score with the maximum score on the graph I managed to make a smart averaging that I now call "AI Level 6" (don't worry, I stop on level 6!).

So now you have the averaged decision value (AvgValue = 0 in this example) and its score (AvgScore = 0.07 in the example) and the maximum decision value (MaxValue = -15 or +15, let chose -15 for the example) and its score (MaxScore = 0.4 in the example), so you make the level 6 AI "refined decision" by miximg both values:

AvgCoef:= Min(1, 1 - ((MaxScore-3*AvgScore)/Max(0.001, 3*AvgScore)));
Level 6 decision = MaxValue * (1-AvgCoef) + AvgValue * AvgCoef

Let see how it behaves, in the next video, yellow kart use level 5, whiel white one uses level 6. Notice white one is more "nervous" than usual:

This formula is not THE formula, it is just one that works not so bad and that I choosed to apply for Level 6 AI. It is not always better that the level 5 one, on smooth circuits with slow karts, using just entropy with "AI level 5" is marginally better, but when things get tought, karts has more power they can use, or track is not so smooth, then having level 6 on makes the kart to react to hard situations somehow earlier than standard.

So I recommend you stick to "AI level 5", and just keep in mind that, in some cases, it may be better to help the AI take fast decision by adding some spices like this level 6 I have tried here.

11 comments:

Andy12 March 2014 at 14:32
Thanks for taking the time to write the pseudo code out. One question.

1.3.1) Take a random decision (value = -5 + random*(-5+5))
Is the -5 applied to each of the 100 iterations? Or is it just there for the initial position?
ReplyDelete
Replies
Andy13 March 2014 at 00:43
Ok. It seems to be what I expected. Just to make sure...
Step 1: set your initial location = your present location minus 5 (if you are doing the -5 option)
Step 2: Simulate a random move between two extremes (e.g. -5 and 5) from the location in step 1.
Step 3: Update you current position = previous simulated position + random move
Step 4: Repeat starting at Step 2 until you have reached time goal (ex. 10s)
ReplyDelete
Replies
Andy13 March 2014 at 14:41
Two other questions:

1. Assuming you only change the heading during a random simulation and assuming there were no boundaries (i.e. open space, no walls) wouldn't the raced distance always be equal and thus the probability also be the same?

2. Under what conditions does a simulated future terminate prematurely? I.e. How do we get shorter paths? running into a wall? others?
ReplyDelete
Replies
Andy14 March 2014 at 01:02
I've been going through you code. Would save me some time to know where I should look for these calculations?
ReplyDelete
Replies

Add comment

Pages

Tuesday, 11 March 2014

Beyond entropy

11 comments: