Reinforcement learning from imperfect data