In the case not all futures are equipropable you have to switch to another, more complex, way of calculating entropy.
When you have N microstates but each one has a different probability of happening, call it P(i), in the instantaneous or "clasical" entropy, we use:
S = Sum. on all possible microstates(P(i)*Ln(P(i)))
If it were about a gas molecules, the Ln(P(i)) part will correspond to the momentum of the molecules, so translating it into our case, it correspond to m*v, the momentum of the kart. As m is the mass and it is a constant, we can forget about it, and only v remains, the kart velocity.
But we are talking on futures, so we need to integrate v -the kart velocity- over the path that followed the kart, and as we use constant time deltas to construct the futures, on each step v is proportional to the raced distance, call it x.
Integrating x over a path gives you r^2/2, with r = length of the path, so using r^2 seems to me like the perfect way to mimic the real entropy on the AI model (it may be wrong, not sure I really understand all the inners of the entropy at this level).
So lets compare and see!
Yellow kart: Use Level 3 with score = Ln(N).
Grey kart: Use level 4 with score = Sum.(r) (r = raced distance)
Black kart: Use level 5 with score = Sum.(r^2)
In my modest opinion, level 5 on black kart properly reflect the original paper formulaes for future entropy, and makes a really great job as a driver.
Can't we do it any better? It is a perfect AI? No, it is a nice and general algorithm, but there are still some aspect of it that can be tweaked for better, not many, but some.
No comments:
Post a Comment