To be clear about the problem we are facing I will just go to the point: usign 100 crazy blind monkeys to randomly drive the kart in the 100 futures I have to imagine, and then take a "common sense" decision based on the things those crazy monkeys did, may be, only may be, was not such a clever idea after all.
Using "crazy blind monkeys" (or totaly random decision, if you prefer not to joke about serious things) is the limiting factor in the simulations: you can not simulate more than 6 or 10 seconds using them, no monkey will survive more than 10 seconds, so it is limiting us to only "10 seconds ahead" strategies.
Watching the first videos I created, I always wondered why couldn't I just pop up the length of the imagined futures up to 10 minutes (instead of 6 seconds) and get an intelligence that start the race and make any movement thinking only on how to win it... instead on how to take the next turn without breaking its neck.
Those crazy monkeys needed a replacement, but what can replace crazy monkeys generating common sense and do it even better? No kind of heuristic could do it, not inside my computer!
Then I realized I needed to use a somehow lower level of this AI to get an aproximate path the monkeys would follow: give the monkeys a little of this itelligence medicine and watch them evolve (yes, like in the planet of the apes!). But was it possible to build a lower version of the "common sense" algortihm and give it to the monkeys to drink? Nopes.
So what I needed to give them was the full intelligence somehow, but this hides a big problem: if you want the kart to be driven with common sense when you are imagining a future, as you need to count with the other players around you to avoid crashing with them, you would need to know witch will be the intelligence drive they will do in your imagined future to avoid them... but they need to know your intelligence drive before they decide... you have a dead loop: you need to know all to be able to know a part of it.
But I had the solution in front of my eyes all the time: I have a machine than can take intelligent decision with ANY system you can simulate, and with the crazy monkeys I can safely simulate a couple of minutes of racing without crashing, so I can recursively use the same algortihm again and again.
Imagine you pack all the simulation we were using, with the karts, the futures, the 100 crazy monkeys, the resulting AI helping the karts to survive the drive... all this is now "your system", and this new system is far more stable than the previous one (the kart with crazy monkeys that only lasted for a few seconds).
We have almost all we need, but there something missing: the goals, the metric in the new
In the last post I showed you goals combos that made up quite nice intelligences with some params you could play with: how much do you love to race fast was the main positive goal, then you have how strong is your tendency to save energy, and finally how strong was your tendency to keep your health high.
Those are 3 free params I played with manually, trying to get the perfect combination, but what if I ask a second layer of intelligence to take this "macro simulation", simulate it in steps of 1 second to construct futures that last 1 or 2 minutes, and let the intelligence to manage this new "joysticks" that are tied to the "love racing" or "hate being out of energy" stenght params. and let it adjust them real time?
That is the raw idea, and here is the result (not as a video, I will need some more code time fro this): you hired a track engineer to assist the driver fom the wall.
This engineer will simulate the race on his laptop, not at the pilot level of decinding at a millisecond time scale, not, it will simulate the long term evolution of the race, and will send a message to the driver like: "we need to reduce fuel consumption, adjust the keep energy goal from actual 0.5 strength to 0.7 as, in the long term, it is better than your actual settings".
So this second layer works exactly as the first one, but its "system" is not the kart and the track, it is the first layer as a whole. The params this layer will decide on will be of a higer level, as the engineer will score things like "overtake this other kart", the engineer goal could be like Score=1/race_position, so if he get the kart jump from position 3 to position 2 in the 2 minute time horizont he uses, it will score 1/2, while other future in witch the kart couln't overtake, the score is 1/3.
The efect is that the 3 params I used to manually set before a race are now intelligently adjusted every second using 100 simulations of a minute or two of the race as it would be with only the layer 1.
This can be repeated many times: a "strategy engineer" could be on top of the "track engineer" telling him "don't try to overtake Hamilton in the next minutes, in five laps will be pit-stop time, and it is far more convenient to try to overtake him there".
And then you could add a "team manager" that could give the "strategy engineer" higher level orders, like "don't try to win the race if you are going to waste 2 engines, we need to keep them for next races"... and so on.
Finally, Eclestone (F1 owner) could send a message to all of them like "We need to lower the team budget limits to allow new teams to come, as we need them for the bussiness health, the higest goal for me, as if we don't generate revenues, we will have to close the F1".
As a bonus, all those layers "protect" your kart from being too fear of crashing to the point it freeze it, as the second layer engineer will notice and adjust it to a lower strength... in the long term, suceisve layers make negative goals less and less dangerous, even innecesary... well, this is my actual bet.
So I need to make my actual algortihm recursive and then just add more layers and watch the results... thats all!
Ops! I forgot! What about the "entropic" side of all this?
Having a new layer makes you predict the futrure in a longer term, and so you take care of producing entropy in a more eficient way, what in turn makes you much "smarter".
If you want a real world case of this "adding a new layer proccess" you have one:
Humanity was never able to predict what will happend to it in a 200 year horizont, but now it is starting to do it and a new concept has emerged: we must act with "sustainability" or we wont survive more than 100 or 200 years from now.
By adding the goal of "being sustainable" we have a kart (or a human kind) able to produce entropy at a nice pace for the next 200 years or more. It is far more optimal that using all the energy available on the next 10 years and then cease to exists.
Entropy is now being created at the best way considered the next 200 years instead of 10.