Enjoy feelings
Once I had this simulation with the goal-less algortihm working I wanted to go further. The kart was really driving it quite nicely, but it clearly was not optimal.Why? The idea was so simple and powerful it was not clear the problem at a first glipse.
Trying to improve
I tried to use longer and longer futures with bad efects. Also incremented greatly the number of futures calculated, but it just showed a marginal gain. This was not the root of the problem.
The real problem was with the scoring of the futures. Being always one was not fair. In some futures the kart crashed quite near the starting point, while other times the kart was able to safely go far away. You can't say both futures are the same thing for you, it had to be some oversimplification.
I decided to try with the more evident candidate for me: the length raced on each future would be the score, so we no longer use N = "number of different futures" to score an option, instead we use the sum of the distance raced on each future.
With this new option scoring eschema, the intelligence recieved a great boost, as you can see in the following video, were the old goal-less intelligence is clearly outperformed by two of the new models (one boosting the new "distance raced" as score, and the orther one, with the squared distance raced).
Something incredible happened: now the agents seems to just like speeding, so they behave and drive more agresively. The one with the squared distance was even more agressive and finally wins, it races more distance in general, but also it had a tendency to being a little too imprudent some times.
In retrospective, this test was a perfect succes and I could not make it any better today. I chosed the correct formula -distance raced without the square- for the task (but first I tried hundres of others, I confess) for several technical reason:
1) Distance raced is a real way to measure the entropy of a moving particle, as the kart is.
The entropy a particle has when you consider a gas, can be aproximated with its linear momentum v*m, so a path integral of this momentum over the future's path (a red or blue line in the videos showing futures), the path integral of v*m*dt, would be a perfect candidate to assign an entropy to the path of a future.
But m is a constant in all my futures so I can safely discard it, we will normalice it after all so it desn't make any difference, and v*dt = raced distance, so we are integrating the distances raced on each time step. That is why the distance raced is the correct way to give a moving particle some form of entropy gain aproximation.
Depending on how you do the path integral, integrating over dt or over dx (the lenth of the delta of path at each step) you will end up with the distance raced, or with the squared version of it. Both are similar ways to compute a real entropy, you only change the physic model for witch you calculate the clasic entropy.
2) Using a real distance to score the future is equivalent to have a real metric in the state space of the system. It also applies to the squared distance raced of the third winning kart.
If you define the distance from state A to state B as the minimum distance raced by a future starting in A and ending in B, you have a real metric on the space of all possible states of the system.
Enjoy feelings
The fact that the score we calculate at each step is in the from v*dt is quite important to understand how we were quitely introducing "feelings" in the mix.We wanted the agent to love speeding, and we ended up using as score the "speed" you are experincing at each moment, multiplied by the amount of time you enjoy it, "dt".
Enjoy feelings represent anything the system, in some relaxed sense, enjoy experiencing. Something possitive that accumulates with time, and that you can't loose, like in the distance raced.
You will need to have enought "enjoy feelings" in your simulations, as the emotional intelligence desperately need them to even work. Having a enjoy feeling of zero means the agent will stop deciding and freeze forever. It is dead. It will only follow the physics laws in the simulation from now.
Other examples
Luckily, all other goals you would need to add to your intelligence will allways have a "enjoy" feeling associated. It is a must, and you, as the "dessigner" of this intelligence, have to find the positive on it, the "bright side".So the golden rule here will be: never add a goal without a enjoy feeling associated.
For instance, a goal created to avoid damage on the rocket, the "take care of health" goal, will have a enjoy feeling associated with your actual health (from 1 to 0), as if being healthy, per se, was a way of enjoying as valid as speeding was for a kart pilot.
In this case, just by having a 50% of health in a given time point, you add in this step of the future an enjoy score of 0.5*dt, meaning you not only enjoyed racing at 200 km/h for some dt seconds with v*dt, you also enjoyed the health you have, multiplied againg by the time you enjoyed it.
I have always ended up determining one "thing" you enjoy associated with the goal or motivation I needed to model, then assume the agent was enjoying it for a delta of time.
The goal "take care of your energy" is quite similar. In this case, the thing you enjoy is "having energy", and the energy level (from 1 to 0) so your enjoy is energy*dt.
Another "only enjoy" feeling I use is "get drops" and "store drops". When the rockets take energy from drops to the storages, they are really enjoying it as much as speeding. In this case enjoy = energy transmited = speed of the energy transmision * dt. You enjoy the "speed" of the transmision, not the energy transmited, as you need to use *dt some how in your formula.
Note: The scale of each feeling have to be manually adjusted in this actual implementation. I used as the unit how much you enjoy racing the size of you body. With this in mind I judged it was fair to use h and e from 0 to 1 as its enjoy feeling value. As you later can set a "strength" of the feeling in the imlementation, this scale is not fixed and can be adjusted in real time. This emotional intelligence is not able to auto adjust the feeling scales -or stregths- to get a better mix, I manually adjust it before every simulation. This will be elegantly addressed in the next version of the model, the "layered" model I am currently working on.
Could exists enjoy feeligs without "*dt" at the end? No. If you try it, may be you will be able to adjust it to something usefull. But if now you switch the delta time from 0.1 s. into a finer 0.01 s., the effect is that your usefull goal now weigths x100 compared to all the other goals that used dt in the formulation. Being so higly dependent on small changes in the delta of time makes it a bad idea to add it to the mix.
Scoring a future with several enjoy feelings
A kart pilot with only one enjoy feeling was a simplistic case. In general, we need to deal with agents that have a big number of them, so we need to know how to actually combine them into a single score.The answer was in the speeding goal we already used.
Remember we used v*dt as the enjoy feeling. But v is a compound of two vector components, v = vx + vy. We could have had two goals instead of one and still get the same intelligence, so both ways have to be equivalent.
As the value for v is sqrt(vx²+vy²) where sqrt() is the squared root-, then, if we add the enjoy feeling for health (h) and enjoy feeling for energy (e), the total enjoy feeling should be:
Enjoy = sqrt(vx² + vy² + h² + e²) * dt = sqrt(v² + h² + e²) * dt
This is the way I add all my enjoy feelings, one from each goal (as they allways have some positive enjoy feeling associated), to get the enjoy feeling (named "Points" in the code) corresponding to every step on the future, by computing it after each state change.
By the way, this makes the mixing of enjoy feelings to always be a real metric over the state space as we mentioned early, no matter how many enjoys you add.
Will it need more feelings?
Enjoy feelings accounts for all good things you can detect while you are imagining a future. It would be enought in a world with no dangers, no hunger, no way to harm yourself and die, no enemies... it is not the kind of world we need to simulate, we need the intelligence to cope with dangers, with batteries that drain, with bodies that can be broken, with others agent that will compite.
We are going to need more than just enjoy feelings if we are out of candy land.
In the next tree posts of this serie we will deal with different ways to cope with this danger, and how to mix them to get a realistic model of how we feel and react to a danger, as the possibility to loose something, being it the health, the energy... or money, and how it modulates the enjoy feelings.
No comments:
Post a Comment