Entropic and Fractal Intelligence: Hacking Reinforced Learning

Wednesday, 24 October 2018

Hacking Reinforced Learning

My good friend and close colleague Guillem had a really busy year attending talks about Reinforced Learning in several events like Piter Py 2017 (Saint Petersburg, Russia), Europython 2018 (Edinburgh, UK) or PyConEs 2018 (Málaga, Spain), and PyData Mallorca (among others!) introducing Fractal Monte Carlo to a broad audience.

All the talks versed about RL, but the talks held at Europython (english) and PyConES (spanish) were both about "hacking RL" by introducing Fractal Monte Carlo (FMC) algorithm as a cheap and efficient way to generate lots of high quality rollouts of the game/system being controlled.

Basically, standard RL methods start by generating random rollouts of the games, to then slowly learn to mimic the most successful episodes in those rollouts, expecting that, over time, the rollout quality (i.e. the game scoring) of the newly generated ones (using the learned policy as a prior instead of just taking random choices) will tend to improve.

This will eventually happen, but very slowly and not necessarily toward a global optimal policy.

Can we hack this standard method and get a faster, more reliable training phase? Yep! As far as we can predict the system's next state within some certainly level (call it having an approximated, probabilistic, stochastic simulator of your system) we can use it to generate a set of very high scoring games to learn from.

Here you have the first talk: "Hacking Reinforced Learning" at Europython 2018 (English):