Implements reinforcement learning environments and algorithms as described in Sutton & Barto (1998, ISBN:0262193981). The Q-Learning algorithm can be used with function approximation, eligibility traces (Singh & Sutton (1996) <doi:10.1007/BF00114726>) and experience replay (Mnih et al. (2013) <arXiv:1312.5602>).