Some days ago I read in tweeter about playing Atari games without having access to the reward, that is, without knowing you score at all. This is called "curiosity driven" learning as your only goal is to scan as much space as possible, to try out new things regardless of the score it will add or take. Finally, a NN learns from those examples how to move around in the game just avoiding its end.
Large-Scale Study of Curiosity-Driven Learning: this is one of the most amazing RL paper I’ve seen since the DeepMind Atari paper in 2013, definitely worth a read!— Tony Beltramelli (@Tbeltramelli) 16 de agosto de 2018
paper: https://t.co/0VcTLn0H3K
video: https://t.co/NmX9yAgWyS pic.twitter.com/Or6TtyVPDw
Our FMC algorithm is a planning algorithm, it doesn't learn form past experiences but decide after sampling a number of possible future outcomes after taking different actions, but still it can scan the future without any reward.