Combining On and Off Policy Learning Aug 9, 2017 Combining policy gradient and Q-learning Q-Prop: Sample Efficient Policy Gradient with an off-policy critic Bridging the Gap between Value and Policy Based Reinforcement Learning Sample Efficient Actor-Critic with Experience Replay Equivalence Between Policy Gradients and Soft Q-Learning The Reactor: A Sample-Efficient Actor-Critic Architecture Please enable JavaScript to view the comments powered by Disqus. Please enable JavaScript to view the comments powered by Disqus.