All the talks versed about RL, but the talks held at Europython (english) and PyConES (spanish) were both about "hacking RL" by introducing Fractal Monte Carlo (FMC) algorithm as a cheap and efficient way to generate lots of high quality rollouts of the game/system being controlled.
Basically, standard RL methods start by generating random rollouts of the games, to then slowly learn to mimic the most successful episodes in those rollouts, expecting that, over time, the rollout quality (i.e. the game scoring) of the newly generated ones (using the learned policy as a prior instead of just taking random choices) will tend to improve.
This will eventually happen, but very slowly and not necessarily toward a global optimal policy.
Can we hack this standard method and get a faster, more reliable training phase? Yep! As far as we can predict the system's next state within some certainly level (call it having an approximated, probabilistic, stochastic simulator of your system) we can use it to generate a set of very high scoring games to learn from.
Here you have the first talk: "Hacking Reinforced Learning" at Europython 2018 (English):
There is also a spanish version from Pycones 2018:
And the more generic talk "Reinforced Learning for developers" at Piter Py 2017 (English):
If you prefer Russian, there is a real-time translated version too:
No comments:
Post a Comment