When you define an entropy as a summary, each of the terms is supposed to be a "a little above zero": small and positive ๐ ≥0 so, when you add it to the entropy it can only slightly increase the entropy. Also, when you add a new probability term having (p=0) or (p=1), you need this new term to be 0 so it doesn't change the resulting entropy at all.
Conversely, when you want to define an entropy as a product of terms, they need to be "a little above 1" in the form (1+๐), and the terms associated with the extreme probabilities (p=0) and (p=1) can not change the resulting entropy, so they need to be exactly 1.
In the previous entropy this ๐ term was defined as (1-pipi), and now we need something like (1+๐) so why not just try with (2-pipi)?
Let us be naive again an propose the following formulas for entropy and cross-entropy:
H2(P) = ∏(2-pipi)
H2(Q|P) = ∏(2-qipi)
Once again it looks too easy to be worth researching, but once again I did, and it proved (well, my friend Josรฉ Marรญa Amigรณ actually did) to be a perfectly defined generalised entropy of a really weird class, with Hanel-Thurner exponents being (0, 0), something never seen in the literature.
As you can see, this new cross-entropy formula is perfectly well defined for any combination of pi and qi (in this context, we are assuming 00 = 1) and, if you graphically compare both cross-entropy terms, you find that, for the Gibbs version, this term is unbounded (when q=0 the term value goes up to infinity):
๐G(p, q) = -(p × log(q)) |
In the new multiplicative form of entropy, this term is 'smoothed out' and nicely bounded between 1 and 2:
๐2(p, q) = (2-qp) |
Just to make sure it was what it looked like, I directly -and naively- sent an email to Stefan Thurner asking for his expert opinion: surprisingly, he kindly answered me! It was a big surprise for him too, so it was official: I had quite an extrange entropy on my hands.
The next question was whether or not this entropy had a nice replacement for the 4th axiom or, in other words, if the entropy of the Cartesian product PxQ of two independent distributions, H(P, Q), was bounded by some elegant combination of the entropies of P and Q, H(P) and H(Q).
It turned out that I needed to modify it a little bit in order to get a decent separability property, so I defined a third generalised entropy (this is the last one, promised!):
H3(P) = 1 + log(H2(P)) = 1 + log(∏(2 - pipi))
H3(Q|P) = 1 + log(H2(Q|P)) = 1 + log(∏(2 - qipi))
If now you compare the four generator functions, one for Gibbs entropy and three more for the new multiplicative ones, you can see how related they are:
It really didn't change much at a first glance, except that now exponents were again (1, 1) and it started to show some interesting separability properties/inequations, but the most interesting property was hidden: you could combine different distributions in a tree-like structure and still get a coherent value of entropy... it seemed to be a graph entropy instead of a distribution entropy.
This was far from evident. You needed to define some strange combination of those two multiplicative forms of entropy, H2 and H3, before you were able to see the "graph side" of it.
In the next post I will be showing how to define this entropy over any directed -and acyclic- graph, so we will be basically extending the concept of entropy of a distributions to a much broader audience.
No comments:
Post a Comment