In the physical layer of the algortihm, we always talk about entropy, and it is not different this time, so lets go down to the basics to understand how to think about negative goals the rigth way.
A living thing is a polar estructure, it follows two apparently oposite laws of entropy at the same time.
First of all, a living thing is a physical thing, so all the physic laws of the macroscopic world we live in apply, and we know it means obveying the second law of thermodinamics: the instantaneus entropy always has to grow, and in the optimum possible way.
On top of this physic law seats the second one: keep your internal entropy low, as low as possible.
This duality is under the need for some kind of negative goals: even if entropy always grown (so positive goals are very natural to the algortihm) you need to maintain your internal entropy low, so you need to add some kind of limiting factor to avoid damaging our selves by letting our internal entropy to grow too much (too much means you die, a little high means you are hill).
Imagine you travel in time and go back when the first living cell started to use "common sense", we will try to follow its development as it will give us the clues to negative goals.
I imagine a living cell floating in the dirty water, its structure is new on earth, it is possible for the cell to get a quite low entropy inside: the cell is ordered, with different parts for different purpouses, and if you look inside a lot of cells, the ways they can be organized internally is really really low, compared to a similar volume of dirty water, much more caotic than the cell interior.
This "living" structure is much better at generating entropy: if you want to take a mountain -very low entropy, tomorrow the montain will be like today, not many changes possible- and blast it into a bunch of rocks, stones and sand, you can wait for the laws of physic to do their work (erosion will eventually do this for you) or add life. If you plant a lot of plants, trees, mouses, bears and so, in centuries instead of millions of years, you can have a totally eroded, caved and changed montain.
Now, by pure random mixed with natural selection, on some of those cell -in this case a primitive algae capable of photosyntesis- gradually appear a new structure: a light detector, placed in the border of the cell, plus a little tail on the oposite side, both connected by only one "wire". A neuronal system so simple it only has a detector, a wire and an actuator.
With 4 o 5 of those structures placed along the cell's edge, you have a totally different thing: this living thing is now intelligent, in some primitive but genuine way.
When ligth hits half of the cell, the detectors on this part will fire and the wires will pass this into the little tails, that will start moving. As a result, the cell will gently travel toward the light, it will scpae from shadows and move into sunny zones.
This simple cell contains all the ingredients for an intelligence, but in such a low level, it makes a perfect structure for our mental experiment with negative goals.
Why you need negative goals in this example of the green cell? Because when this cell reach a sunny part, the detectors will still fire, all at the same time, so all the "tails" will move during all the time the cell is having the sun bath. You will spend more energy than what you can get from the sun ligth!
How did nature dealed with that? Negative socoring, but the neuronal version of it: inhivition of signals.
Imagine the last example, the cell has 4 of those detector-wire-tail structures that help it move toward the light. Now add a simple neuronal network with only one neuron, connected to the four receptor as inputs and to the four tails as ouput.
If all the four detector fire at the same time, their signal reach this neuron and activate it. If only three of the detector fires, this neuron won't fire. This neuron is detecting the case "it is sunny everywhere", it will only fire on this event.
When "the" neuron -its brain only has this one- fires, it pass this to the four "tails" in such a way, it will "inhivite" the firing of the tail movement.
You have avoided this ugly case of "all sunny" with a simple but "negative" signal coming from one of the simpliest neuronal network possible.
How does it look in the real algorthim? Quite simple too!
I added to the simulation a "health" and an "energy" for each of the players -karts or rockets- and, when scoring each future, instead of just using the scquere of the distance raced, I multiplied it by the health level (from 0=dead to 1=perfect) and the energy level (from 0=depleted to 1=full). That was all.
The efect was: if ina future I crash and my 100% health lower to 10%, then the positive goals collected in the future trace will be lowered with *0.1, so not crashing futures will automatically be 10 times more appeling compared to the ones where I loose 90% of my health.
So let have a look on 10 intelligent players -5 kart and 5 rocekts- moving around a circuit filled with small drops of energy. Getting the drops doesn't give any scoring, but the energy you get will make the energy level to pop up from 0.85 to 0.95, so in the same way not crashing was interesting, it is getting a drop.
Just to make it a little more natural, I added to the rockect the possibility to land gnetly on the black pixels and rest to slowly refill its energy. But I never told them to land or anything similar, I just said: a future score with (raced^2)*health*energy, all the behaviours you will see just emerged from this formulae (plus the neutral common sense).
There exists a second way to do it, I assume it appeared in nature much later, but it works quite diferently: fear.
But this post was about reduction goals, so I will stop here!
No comments:
Post a Comment