All the talks versed about RL, but the talks held at Europython and PyConES (this last one, in Spanish, still not on-line) were both about "hacking RL" by introducing Fractal Monte Carlo (FMC) algorithm as a cheap and efficient way to generate lots of high quality rollouts of the game/system being controlled.
Basically, standard RL methods generate random rollouts of the games, to then slowly learn to mimic the most successful episodes in those rollouts, expecting that, over time, the rollout quality (i.e. the game scoring) will tend to improve.
This will eventually happen, but very slowly and not necessarily toward any global optimal policy.
Can we hack this standard method and get a faster, more reliable training phase? Yep! As far as we can predict the system's next state within some certainly level (call it having an approximated, probabilistic simulator of your system) we can use it to generate a set of very high scoring games to learn from.
Here you have the first talk: "Hacking Reinforced Learning" at Europython 2018 (English):
And the more generic talk "Reinforced Learning for developers" at Piter Py 2017 (English):
If you prefer Russian, there is a real-time translated version too: