## Wednesday, 13 June 2018

### Graph entropy 5: Relations

After some introductory posts (that you should had read first, starting here) we face the main task of defining the entropy of a graph, something looking like this:

## Tuesday, 12 June 2018

### Graph entropy 4: Distribution vs Graph

In previous posts, after complaining about Gibbs cross-entropy and failing to find an easy fix, I presented a new product-based formula for the entropy of a probability distribution, but now I plan to generalise it to a graph.

Why is it so great to have an entropy for graphs? Because distributions are special cases of graphs, but many real-world cases are not distributions, so the standard entropy can not be applied correctly on those cases.

We will assume that 350 of them are gas-only cars, 50 are pure electric and 100 are hybrids (but we don't know this in advance).

If we use the new H

Why is it so great to have an entropy for graphs? Because distributions are special cases of graphs, but many real-world cases are not distributions, so the standard entropy can not be applied correctly on those cases.

## Graph vs distribution

Let's take a simple but relevant example: there is a parking lot with 500 cars and we want to collect information about the kind of engines they use (gas engines and/or electric engines) to finally present a measurement of how much information we have.We will assume that 350 of them are gas-only cars, 50 are pure electric and 100 are hybrids (but we don't know this in advance).

### Using distributions

If we were limited to probability distributions -as in Gibbs entropy- we would say there are three disjoint subgroups of cars ('Only gas', 'Only electric', 'Hybrid') and that the probabilities of a random car to be on one subgroup are P = {p_{1}= 350/500 = 0.7, p_{2}= 50/500 = 0.1, p_{3}= 100/500 = 0.2}, so the results of the experiment of inspecting the engines of those car has an Gibbs entropy of:
H

_{G}(P) = -๐บ(p_{i}× log(p_{i})) = 0.2496 + 0.3218 + 0.2302 =**0.818**If we use the new H

_{2 }and H_{3}formula, we get a different result, but the difference is just a matter of scale:
H

_{2}(P) = ∏(2 - p_{i}^{pi}) = 1.2209 * 1.2752 * 1.2056 = 1.8771
H

_{3}(P) = 1 + Log(1.8771) =**1.6297**## Monday, 11 June 2018

### Graph entropy 3: Changing the rules

After showing that the standard Gibbs cross-entropy was flawed and tried to fix it with a also flawed initial formulation of "free-of-logs" entropy, we faced the problem of finding a way to substitute a summary by a product without breaking anything important. Here we go...

When you define an entropy as a summary, each of the terms is supposed to be a "a little above zero": small and positive ๐ ≥0 so, when you add it to the entropy it can only slightly increase the entropy. Also, when you add a new probability term having (p=0) or (p=1), you need this new term to be 0 so it doesn't change the resulting entropy at all.

Conversely, when you want to define an entropy as a product of terms, they need to be "a little above 1" in the form (1+๐), and the terms associated with the extreme probabilities (p=0) and (p=1) can not change the resulting entropy, so they need to be exactly 1.

In the previous entropy this ๐ term was defined as (1-p

Let us be naive again an propose the following formulas for entropy and cross-entropy:

Once again it looks too easy to be worth researching, but once again I did, and it proved (well, my friend Josรฉ Marรญa Amigรณ actually did) to be a perfectly defined generalised entropy of a really weird class, with Hanel-Thurner exponents being (0, 0), something never seen in the literature.

As you can see, this new cross-entropy formula is perfectly well defined for any combination of p

In the new multiplicative form of entropy, this term is 'smoothed out' and nicely bounded between 1 and 2:

When you define an entropy as a summary, each of the terms is supposed to be a "a little above zero": small and positive ๐ ≥0 so, when you add it to the entropy it can only slightly increase the entropy. Also, when you add a new probability term having (p=0) or (p=1), you need this new term to be 0 so it doesn't change the resulting entropy at all.

Conversely, when you want to define an entropy as a product of terms, they need to be "a little above 1" in the form (1+๐), and the terms associated with the extreme probabilities (p=0) and (p=1) can not change the resulting entropy, so they need to be exactly 1.

In the previous entropy this ๐ term was defined as (1-p

_{i}^{pi}), and now we need something like (1+๐) so why not just try with (2-p_{i}^{pi})?Let us be naive again an propose the following formulas for entropy and cross-entropy:

H

_{2}(P) = ∏(2-p_{i}^{pi})
H

_{2}(Q|P) = ∏(2-q_{i}^{pi})Once again it looks too easy to be worth researching, but once again I did, and it proved (well, my friend Josรฉ Marรญa Amigรณ actually did) to be a perfectly defined generalised entropy of a really weird class, with Hanel-Thurner exponents being (0, 0), something never seen in the literature.

As you can see, this new cross-entropy formula is perfectly well defined for any combination of p

_{i}and q_{i}(in this context, we are assuming 0^{0}= 1) and, if you graphically compare both cross-entropy terms, you find that, for the Gibbs version, this term is unbounded (when q=0 the term value goes up to infinity):๐G(p, q) = -(p × log(q)) |

In the new multiplicative form of entropy, this term is 'smoothed out' and nicely bounded between 1 and 2:

๐2(p, q) = (2-qp) |

## Sunday, 10 June 2018

### Graph Entopy 2: A first replacement

As I commented on a previous post, I found that there were cases where cross-entropy and KL-divergence were not well defined. Unluckily, in my theory those cases where the norm.

I had two options: Not even mentioning it, or try to go and fix it. I opted for the first as I had no idea of how to fix it, but I felt I was hiding a big issue with the theory under the carpet, so one random day I tried to find a fix.

Ideally, I thought, I would only need to replace the (p

Wow! They were just mirror images one of each other! In fact, you only need a small change to match them: (1-(p

I had two options: Not even mentioning it, or try to go and fix it. I opted for the first as I had no idea of how to fix it, but I felt I was hiding a big issue with the theory under the carpet, so one random day I tried to find a fix.

Ideally, I thought, I would only need to replace the (p

_{i}× log(p_{i})) part with something like (p_{i}^{pi}), but it was such a naive idea I almost gave up before having a look, but I did: how different do those two functions looks like when plotted on their domain interval (0, 1)?Wow! They were just mirror images one of each other! In fact, you only need a small change to match them: (1-(p

_{i}^{pi})):### Graph entropy 1: The problem

This post is the first on a series about a new form of entropy I came across some months ago while trying to formalise Fractal AI, the possible uses for it as a entropy of a graph, and how I plan to apply it to neural network learning and even to generate conscious behaviour in agents.

## Failing to use Gibbs

The best formula so far accounting for the entropy of a discrete probability distribution P={p_{i}} is the so-called Gibbs-Boltzmann-Shannon entropy:

H(P) = -k*๐บ(p

_{i}× log(p_{i}))
In this famous formula, the set of all the possible next states of the systems is divided into a partition P with n disjoint subsets, with p

_{i}representing the probability of the next state being in the i-th element of this partition. The constant k can be anything positive so we will just assume k=1 for us._{i}} and Q={q

_{i}}, or the entropy of Q given P, H(Q|P), a measure of how different they both are or, in terms of information, how much new information is in knowing Q if I already know P.

In that case, the Gibbs formulation for the cross-entropy goes like this:

H(Q|P) = -๐บ(p

_{i}× log(q_{i}))
Cross-entropy is the real formula defining an entropy, as the entropy of P can be defined as its own cross-entropy H(P|P), having the property of being the maximal value of H(Q|P) for all possible distributions Q.

As good and general as it may looks, this formula hides a very important limitation I came across when trying to formalise the Fractal AI theory: if, for some index you have q

H(P) = H(P|P) ≥ H(Q|P)

As good and general as it may looks, this formula hides a very important limitation I came across when trying to formalise the Fractal AI theory: if, for some index you have q

_{i}=0 then if p_{i}is not zero too, the above formula is simply not defined, as log(0) is as undefined as 1/0 is.## Wednesday, 14 March 2018

### Fractal AI "recipe"

Now that the algorithm is public and people is trying to catch up, it can be of help -mainly to us- to have a simplistic schema of what it does, with no theories nor long formulae, nothing extra but the raw skeleton of the idea, so you can read it, get an idea, and go on with your life.

I will try to simplify it to the sweet point of becoming... almost smoke!

##

I will try to simplify it to the sweet point of becoming... almost smoke!

##
__Ingredient list:__

- A system you can -partially- inspect to know its actual state -or part of it- as a vector.
- A simulation you can ask "what will happen if system evolves from initial state X0 for a dt".
- A distance between two states, so our "state space" becomes a metric space.
- A number of "degree of freedom" the AI can push.

## Thursday, 15 February 2018

### Fractal AI goes public!

Today we are glad to announce that we are finally releasing the "Fractal AI" artificial intelligence framework to the public!

This first release includes:

We have modified Fractal AI a little so now it is more powerful than ever, in fact we have been able to beat a lot of the actual records (from state-of-the-art neural network like DQN or A3C) with about 1000 samples per strep (and remember, no learning, no adaptation to any game, same code for all of them).

We are specially proud of beating even some absolute human world records, but hey, it was going to happen anyhow!

This Ice Hockey game scored 64... one goal every 3 seconds or more! Human absolute record is 36, and previous AIs scored 10.5 the best, and I have an inmortal centipede running on my PC for a week, it is scoring 1,241,988 right now and has 6 extra lifes... human record is 1,301.000 and best AIs scored 25.000, so tomorrow I will kill the game if it is still alive.

This first release includes:

- Fractal AI: A fragile theory of intelligence A nice to read small book in PDF with theory, algorithm, pseudo-code, experiments, etc.
- github.com/FragileTheory/FractalAI with working python code to beat almost all of the actual Atari games.

We have modified Fractal AI a little so now it is more powerful than ever, in fact we have been able to beat a lot of the actual records (from state-of-the-art neural network like DQN or A3C) with about 1000 samples per strep (and remember, no learning, no adaptation to any game, same code for all of them).

We are specially proud of beating even some absolute human world records, but hey, it was going to happen anyhow!

This Ice Hockey game scored 64... one goal every 3 seconds or more! Human absolute record is 36, and previous AIs scored 10.5 the best, and I have an inmortal centipede running on my PC for a week, it is scoring 1,241,988 right now and has 6 extra lifes... human record is 1,301.000 and best AIs scored 25.000, so tomorrow I will kill the game if it is still alive.

Subscribe to:
Posts (Atom)