Entropic and Fractal Intelligence: OpenAI first record!

Sunday, 18 June 2017

OpenAI first record!

We have just submitted our first official scoring on OpenAI gym for the atari game "MsPackMan-ram-v0" based on RAM (so you do not see the screen image, instead you "see" a 128 KB RAM dump).

Our just submitted algorithm "Fractal AI" played 100 consecutive games -the minimum allowed for an official scoring- and get an average score for the best 10 games of 11543 +/- 492, well above previous record of 9106 +/- 143, so we are actually #1 on this particular atari game:

Previous record by MontrealAI achieved an average scoring of 9106 +/- 143 after playing about 300k consecutive games. When you inspect its learning ratio, you notice how different our approach is:

As you can see, it is a pure "Learning algorithm", meaning that it starts with zero knowledge and a also near-zero scoring, and as it learns from more games, it gets better and better scoring, so after learning from 300.000 games, it can achieve scores of about 9.000 points.

In contrast, Fractal AI is a pure-intelligence algorithm, it doesn't learn at all by its own (on its simpler incarnation), so to get better scoring, you need more thinking power (more CPU or a better implementation).

If we super-impose both graphs, this difference is quite evident:

The X-axis is a problem here, 100 vs 300k makes Fractal AI's graph to "compact" into a single vertical line on the left-most limit, but it cast a realistic image of the situation: Fractal AI, with the amount of CPU allowed (converted into number of walkers and ticks used to think) consistently ranges in the 5k - 20k, with some peaks here and there, but it doesn't get better with expertise like learning ones.

Adding learning is, of course, the next step, as it would make the algorithm orders of magnitude better (given it time) and faster (learning allows as to cut down number of walkers over time saving most of the CPU needed without learning), but until then, we will try to beat some more atari games and other OpenAI environments we already worked on in the past (but never submitted) like the pole or the hill climbing classic control ones.

Update: Qbert also has official score now (27/6/2017)!

16 comments:

jesmend20 June 2017 at 18:21
Hola Sergio, sigo tu trabajo en Causal Entropic Forces desde hace tiempo. Está genial este resultado, veo que vas en camino de romper todos los records :D. Tengo un par de preguntas que ojalá puedas contestar. 1. Cuando PacMan se come todos los cuadritos entonces el juego termina, no debería considerar que comerse todos los cuadritos es equivalente a quedarse sin futuros? 2. Hice un algoritmo para intentar resolver el TSP y aunque parece que tiene sentido el arbol de busqueda crece muy rápido dando como resultado que el horizonte de tiempo tiene que ser muy pequeño. Crees que se podría implementar búsqueda fractal en un problema discreto?
ReplyDelete
Replies
jesmend20 June 2017 at 18:22
Este es el TSP https://en.wikipedia.org/wiki/Travelling_salesman_problem
ReplyDelete
Replies
Unknown24 June 2017 at 19:37
It's quite impressive how well the intelligence algorithm works.

However, correct me if I'm wrong, but I don't think it's a fair or apples-to-apples comparison when comparing it to other algorithms such as MontrealAI.

I believe the Fractal AI basically boils down to sampling which possible futures would be attained by taking different actions at this instant, and then choosing the best action based on a specific value function (for example, based on entropy), and doing this repeatedly every instant. The algorithm can know how current actions produce different outcomes because it uses the game itself to have perfect knowledge of the future, not because it has learned to model the game. The MontrealAI on the other hand does not assume it knows the future perfectly, instead it tries to learn it.

In other words, the problem can be separated into two separate problems: (1) modelling the system to be able to predict future outcomes, and (2) using this model to act intelligently. The fractal AI is only working on the second problem (and it is solving it brilliatnly), but it is not addressing the first problem. The Montreal AI is trying to tackle both at the same time.

To have real applicability in real-life problems you need to solve both problems, otherwise you are lmited to controlling systems where you already have a mathematical model of it, which are limited.

Also, the first problem (learning to model the system) is actually the most challenging of the two, by far. The algorithm must somehow learn to understand the dynamics of a system based only on some sensor data. This is a gargantuan problem, and I believe is the real challenge.

Don't get me wrong, I do believe that entropy-based intelligence is as close as we can get to the magic bullet, the be-all-end-all solution to problem #2, and the Fractal AI seems to be a quite efficient and effective implementation of this principle. I think this is a huge discovery, with enormous implications which we still do not understand. But to turn it into an AI that can be actaully used in real-life scenarios we still need to solve problem #1, which is an enormous problem which we are still far from the solution.

ReplyDelete
Replies
Unknown27 June 2017 at 18:56
Fractal memory, very interesting. So as I understand it, you want to use a NN (or alternatively a fractal algorithm) to add memory to the system. The idea is that it learns from previous experience, biasing it towards what we know works, instead of having to constantly re-process solutions that it already found.

The problem that really interests me the most, however, is a bit different. My question is, how could we apply a Fractal AI to intelligently control -any- robot or system, without having to code the dynamic model of the robot or system into the algoirthm? For example, how could I actually build that 'cheat box' to beat the actual pacman arcade machine? How would the Fractal AI work if it doesn't have access to the game's API to sample future outcomes?

Merely adding memory to the algorithm does not solve the problem. Memory only allows the algorithm to remember what previously worked, but it doesn't help it find new ways that have a good chance of working but haven't been tried yet. In more concrete terms, the Fractal AI won't know how to predict future outcomes; it only knows past history. If you limit the Fractal AI to utilize only past history, the resulting algorithm will work no better than a simple random search, where good solutions are found by chance and stored to be repeated later. This is basically what the Montreal AI does, and it's not that effective.

More than memory, what you need is the algorithm to learn to create a dynamic model of reality. This dynamic model not only fits with what we have already experienced, but is able to extrapolate into the future to predict things that have never been experienced. This dynamic model can be used by the Fractal AI to sample into the future and find strategies that could work but have not been tried yet. This will make it more intelligent than a simple random search, and will approximate the results you are showing in OpenAI, only that it won't be as good because now the system has to learn to understand the system, instead of having direct access to the solution.

This is pretty much how the human rational mind works; we create a simulation of our world in our minds based on the data we obtain from our senses, then we extrapolate and think what would happen if we do different actions, and then we choose the action that we believe will optimize a function (happiness, wealth, etc).

I've been studing this idea for a while now, and have found that NN's are very limited in their capacity to create these dynamic models. In fact, I think we are quite far from solving this problem. But once we do, we will have the holy grail: a true universal AI; one which can be applied to any system and it will learn to act intelligently, without any tuning or specific coding.

Me encantaria hablar algun dia. Soy de Puerto Rico, soy ingeniero mecanico y trabajo en sistemas 'drones', y me interesa mucho los temas de AI.

-J

ReplyDelete
Replies
Anonymous28 June 2017 at 15:27
Will you release the source code? When?
ReplyDelete
Replies
Unknown5 October 2017 at 07:18
Im not sure if you have seen this thread, but here they are discussing your findings. Some mention a similar argument to the one I made above “Yes - the issue is that the work is currently presented as requiring "no training", but it has simply relocated that problem to constructing a perfect simulation of the environment. It then uses the fact that current benchmarking systems have available simulations to "cheat" rather than learning that function itself. One of the most difficult and interesting parts of reinforcement learning is constructing the function that determines the evolution of the system. If you know the evolution function a priori the problem is mostly trivial - i.e. alpha-beta search, graph searching, etc.”.

https://news.ycombinator.com/item?id=14709896

I dont mean this as a discouragement but as constructive input. I love your work, looking forward to the next post.
ReplyDelete
Replies

Add comment

Pages

Sunday, 18 June 2017

OpenAI first record!

16 comments: