Before going any further on the algorithm itself, we will stop for a moment on the real meaning of those "causal entropic forces" the algorithm is based on.
This is a little technical -but quite interesting- and I will try my best on being easy to follow, but feel free to pass on this and focus on the algortihmic-only articles if you want. You will get as much knowledge of the AI as you will need to apply it, but be warned: when it comes to defining your own system, adjusting the params of the AI and polishing the way you measure how better a new situation is compared to a previous one, the understanding of the underlaying physics will give you an extra insight on the proccess and will help you pin-point the weak points on your implementation.
Disclaimer: I am not a physicist, just an oxidized mathematician and a programmer who loves reading about the subject, so please be benevolent when commenting! I just pretended anyone could have a clear picture of the concept itself and the extreme power under the nice sounding word "entropy".
Entropy
Entropy is a very powerful phisical concept, it is behind almost all laws of the classic phisics. It is quite simple in its definition, but almost imposible to directly use in any real world calculation.
Definition
An informal definition of entropy could be like that:
Imagine you have a transparent ball of some radius, now you fill it with a hive of 100 flying bees. In the jergon, this is our "system", and as in this case, it is supposed to be "isolated" from the rest of the universe.
From the outside, you just see a ball filled with a lot of bees. This is the "macro state" of the system, what you can see of it in a given moment.
But if you place a fixed camera near the ball and take a serie of consecutive images, when later comparing them, you will find that most of them are quite different from the others. Bees are in different positions if you freeze time and take some shoots. Each one of these "different images" is called a "micro state" of the system. This is like a fine detailed version of the macro state.
If you use N for the maximum number of different "images" -or micro states- you could get (apart from the megapixels count of your camera and how you define "different", it is supposed you use some given precision in your readings), this N is, basically, what physicist call the "entropy" of the system.
If now we reduce the radius of this transparent ball, bees will get nearer from each other, and the number N of different images you could get, will drastically decrease. When radius reaches a minimum, bees get so packeted they just can't move. In this situation, you will only have one single possible picture or micro state. They will be with maximum order and minimun disorder, or entropy.
The actual formula for the entropy of a macro state with N possible micro states is S = kLog(N), but there is an alternative formulation and well, it is not so important in this history, just remeber this: the more different "images" you can take of your "ball with bees", the more dissordered and the higer the enthropy will be.
Entropy and physic laws
As far we have seen just the mathematical definition of entropy, now I will try to show you how important this concept is, starting with physics.
One of the most important laws in thermodinamics, the second law, talks about entropy: "entropy of a isolated system always grow". What really matters here is that it will never decrease: any real physic system will evolve, never mind witch laws it has to obvey, in a way its entropy -or dissorder- will always increase with time.
This is powerfull, but there is a much more refined version: a system always evolves in exactly the way its entropy grows as much as possible at any moment.
As an example, if you have an empty tank connected to another one filled with a gas, and then you open the connection, the gas will flow from one tank to another in a way you could calculate by using know flow theories, and you will get the aproximate evolution of the gas.
But if you could calculate, from all possible small changes that could happend to the gas in any given moment, exactly the one that would give you the higer possible entropy, then you would know exactly how the gas will evolve, as any system evolves exactly chosing this path that maximize entropy creation at each little steep. And the most interesting thing is that this is true for all kind of physical systems.
Note: technically it applies only to "probabilistic" systems, meaning they are formed by a miriad of small parts (macoscopic systems in short). It does NOT apply to small nano systems with a few particles as the ones in some quantum studies where entropy can decrease some times (here the same new in spanish).
So it is said nature tends to make entropy grow as much as possible on all ocasions, and this tendency, in some way, is what makes the classical laws of physics to emerge when you see the system from a distance.
It is very important to note that, when here we talk of tending to the state of higer entropy, we are talking about inmediate growth. In no way nature "knows" witch entropy the system will have in some seconds. It is a short term tendency.
In this history, what we called "tendency", can be understood as a kind of new "force" that emerge from nowhere and physically push the system toward this state that maximizes the entropy in the short term.
This magical force is thecnically called an "entropic force", as opossed to a "physical force" we all knew of.
Calculating physical forces in a system is easy with today's computers, but computing the entropy of a real world physical system is just dreaming, not to talk about calculating the path at witch the entropy grow quicklier, that is needed to get the "entropic forces". But, if we knew how to calculate it, then using the first or the second set of forces, physical or enthropic ones, should give you the same results.
Entropy and life
So nature always tends to make entropy grow up, ok, but some way nature did managed to fool itself with a curious invention called "life".
Life is not easy to define, but my favourite one is about entropy, again, and can be expressed this way:
When a "subsystem" -an ameba for instance- that is inside a bigger system -the ameba and its environment- is capable of consistently lowering the subsystem entropy, at the logical expense of "pushing" it away and into the environment, it is said that this subsystem is alive.
It is not a complete definition, and may be your phone can do this too, but all the things we assume are living creatures follow this rule, while "dead" systems do not. There are always things in between, like a virus or a robot with solar cells, but definetly a cell keeps its interior clean and tiddy while poluting its near environment with heat -that increments the entropy on the medium- and detritus.
Humans just do this with the whole planet in a more efficient way. We are, in a planetary scale, alive.
Entropy and intelligence
So systems gain entropy with time, while living things manage to lower their internal entropy somehow, but what makes different a live being acting intelligently, as opposed to a dumb one?
Some studies, as the paper that inspired me, has suggested that deciding on the only base of having more reachable different futures, is a quite nice strategy to generate "intelligent behaviour" on a system.
The idea is simple:
Suppose you have some possible bright futures ahead, and you have to decide at each moment among several options, let say only two to make it simple. Your decisions will make you reach one future or another, so the question here is how to decide about doing "A" or doing "B".
So if having more accesible futures is a good strategy, then I could count how many of those brigth futures that I can imagine start by chosing "A", and how many start by chosing "B". The option with more futures is your best bet, and as long as you didn't hide some not so bright futures in the proccess, the decision will be basically right.
This is almost exactly what the kart AI uses in the video 1, so you can have a look at this simple idea applied to a kart, if you haven't done this far.
Surprisingly, this sounds quite similar to the "usual" entropy definition!
We can actually define this kind of intelligent behaviour as a natural tendency to always chose what maximizes the number of reachable different futures.
So if we unofficially call "the entropy of your futures" to the number of different futures you can reach at a given moment from your actual position (well, technically we changed different "microstates" found in the present time, with different "macrostates" found in the future in the classical entropy definition), then, if you were an intelligent agent, you would tend to change your position in such a way the "entropy of your futures" grows as fast as possible.
This tendency any intelligent agent will have to increase its number of reachable futures, as in the physic previous example, can be seen as a entropic force than, some how, pushes the agent to do what it does. When this behaviour is seen as a force, as this algorthim does, this force is called "causal entropic force".
Please note this is not exactly what the paper suggest, nor it is a real "entropy", it is just an over simplified version, that surprisingly works quite well. In the same way as, in the "usual" entropy definition, we used kLog(N) as the entropy value, here, the entropy of the reachable futures is a little more complicated that just a "N". In the paper it takes the form of a double integral over a path of an expresion using probabilities of the different futures. Quite intimidating, but we will come on this on a later article.
What is really remarcable about this idea is that, in the same way the growth of the entropy could explain all laws of nature, if you code it right, this algorithm would decide intelligently about anything you place in front of it, whatever this means for each particular case. This is truly remarcable if it proves to be true (just showing an intelligent behaviour in some tests doesn't prove anything so general).
That was all about entropy, a lot of important details were intentionally left behind as I warned you, but I hope you now have a somehow complete view of what this algorithm is based on, and how powerful this could be if we manage to generalize it and apply to very different systems.
Thanks for this explanation. I read the original paper, but this post really helped to clarify my understanding. The phrase that gave me my "Ah!" moment was, "how many of those bright futures start by choosing A, and how many start by choosing B...."
ReplyDeleteIf I may make a suggestion. All the posts would benefit from a bit of pseudo code and a sample calculation.
Finally, I am really enjoying this blog!! I woke today and thought... "great... He just posted another entry!"
Hi andy, it is really nice to know you are having fun reading this, really!
ReplyDeleteMy "Ah!" moment was when, after reading it was about reaching as many futures as possible and then having a look at the paper, I found my self in front of this formulae with a double path integral on some probability I never heard about... I was just to give up, but then I read "Montecarlo" and all made sense.
About inserting pseudo-code, I will try to remember, I did it on the video 1 entry and, from this point on, I just commented on small changes to this first case, but it would be better to have a real example with numbers and pseudo code in the last entry about levels 4 and 5.
I promise to edit and add it!
And anyhow, remember you have real code on the download page, it is a V0.7 that had a big bug that made it be quite slow and some times kart refuse to start and just freeze, next monday I will upgrade to V0.8 with this fixed and the ability to use AI level from 1 to 6 on each kart, so comparing algorthims is very easy.