Monday 4 February 2019

The Gordian Knot of AGI

All the people working in AI I have ever meet had the same feeling you can read here and there: what we do with Deep Learning is fantastic and opens a number of totally new fields for us to solve, and Reinforced Learning is going one step further, but it doesn't feel like if it were "real" intelligence, not like the general and "plastic" intelligence we feel, flowing in our own brains, it feels more like a powerful tool in our untrained hands.

I totally agree. Deep learning is very good at making sense of our "observation" of the world, building an internal state that, in practical terms, can be used as if we had direct acces to the real system state, and also predicting the most probable next states of the system, thats true, but even if we could make it so perfectly that our agent had magical acces to the real system state and a perfect simulation of it, even in that ideal case, we don't real know what to do with this valuable information.


A real AGI needs to address two very different problems before being a complete AGI.

First, it must learn to extract an usable state out of a serie of observations, and also to predict our future next observation of the system, given that we take this or that decision and take the corresponding action.

But once you have it, you need to decide on your next action, and this step is even more important in showing real intelligence that the previous one: if an agent were able to exactly predict its next observation, but was unable to take the most "intelligent" decision out of it, then it would be just a waste of learning, but if another agent, having incomplete knoledge of its environment and its evolution, is able to take meaninful and intelligent decisions, we would all agree in that the second one is more intelligent than the first.

All the state of the art in AI is mainly focused on the first problem, the sensorial part of the problem where you learn to deal with a stream of raw data, make sense out of it, and predict its sequence, but there is basically nothing about as general and powerful for how to actually decide and choose our next action.

We have control theory, we have Monte Carlo Tree Search (MCTS), and we have Reinforced Learning (RL) and AlphaZero, so yes, there is something, but to be honest, they are not up to the task of teaming up with neural networks and deep learining and provide a general, easy and computationally efficient way to take optimal decisions.

Reinforced Learning (RL), being the main contendient here, lacks a trully powerful way of scanning the future outcomes of the actions being choosen, instead, it is based on learning, from memories of past episodes, to predict a "q-value" or the "expected reward" after a period.

Taking the action with the highest q-value works to some extent, but not so much actually. Adding MCTS helpsa lot -AlphaZero uses it with great results- but you can not actually scan a ver y complex future landscape using a one-path-at-a-time schema and be efficient at the same time and, in my opinion, this is the point when we loose faith on the "reality" of the intelligence we build.

In the other hand, Fractal Monte Carlo (FMC) algorithm provides the second part of an AGI in a compact and efficient code, it is able to scan any future landscape, works on continuous or discreted decision spaces, and is general enough to fit with the second task requeriments, so the question is: will the mix of a pair of equally effcient algorithms, one for the learning part (DL) and a second one for the decision part (FMC) produce the synergies and couplings and eventually produce this AGI behaviour we want?

My answer is yes, and for a simetry question: the algorithms we use for AI and RL (neural networks, q-learning, bayesian inference, etc) are all based of learning from memories taken from past episodes. We don't use any ahead-looking algorithm, except for Monte Carlo methods we could add to the mix. But all the used Monte Carlo variations base its inner working in plotting one new path at a time, to consider one of the possible futures after another.

This aproach is not able to deal with complex spaces, it just doesn't deliver. Only by building all the path simulteneously and allowing the different paths to cooperate we can do it, but this involves building fractal trees of paths instead of threads of linear, causally connected paths of states. A fractal ahead algorithm is, in my opinion, the actual Gordian Knot of AGI.

But an AGI without a consciousness is an AGI whose goals have to be externally given. This is general AI in some sense, but not a real AGI as we humans are: this is one step above what we will get, but not so far!

2 comments:

  1. Might "A Monte Carlo AIXI Approximation" be improved by FMC? See: https://arxiv.org/abs/0909.0801

    ReplyDelete
    Replies
    1. Yes. Basically, any algorithm doing planning based on a Monte Carlo Tree Search can replace this MCTS with a Fractal Monte Carlo equivalent.

      Basically, you get a much better performance in the planning part, allowing you to scan much deeper into the future (as doubling depth just cost double time in FMC), and a more reliable decision (as it is based on choosing the highest entropy one).

      Delete