Reinforcement Learning


Deep reinforcement learning (RL) has an ever increasing number of success stories ranging from realistic simulated environments, robotics and games. Experience Replay (ER) enhances RL algorithms by using information collected in past policy iterations to compute updates for the current policy. ER has become one of the mainstay techniques to improve the sample-efficiency of off-policy deep RL.

ER recalls experiences from past iterations to compute gradient estimates for the current policy, increasing data-efficiency. However, the accuracy of such updates may deteriorate when the policy diverges from past behaviors and can undermine the performance of ER. Many algorithms mitigate this issue by tuning hyper-parameters to slow down policy changes. An alternative is to actively enforce the similarity between policy and the experiences in the replay memory. We introduce Remember and Forget Experience Replay (ReF-ER), a novel method that can enhance RL algorithms with parameterized policies. ReF-ER (1) skips gradients computed from experiences that are too unlikely with the current policy and (2) regulates policy changes within a trust region of the replayed behaviors. We couple ReF-ER with Q-learning, deterministic policy gradient and off-policy gradient methods. We find that ReF-ER consistently improves the performance of continuous-action, off-policy RL on fully observable benchmarks and partially observable flow control problems.




Contours of the vorticity field (red and blue for anti- and clockwise rotation respectively) of a 2D flow control problem: The D-section cylinder is moving leftward, the agent is marked by A and by the highlighted control force and torque. On the right, the returns obtained by V-Racer (red), ACER (purple), DDPG with ER (blue) and DDPG with ReF-ER (green). V-Racer outperforms all other methods in this fluid mechanics application. ReF-ER extension improves the performance of DDPG. (ACER, DDPG are other state-of-the-art RL methods)


We can utilize the reinforcement learning algorithms to study the behavior of agents in fluid mechanics applications, where obtaining data from interactions with the environment is extremely costly and data-efficient methods like V-Racer are imperative. In the following video, V-Racer is utilized in a fish agent swimming behind a cylinder that learns to swim in order to optimize its efficiency (reward). One of the strategies the agent learns is to slalom between the vortices, harvesting the energy in the flow.


  • G. Novati, L. Mahadevan, and P. Koumoutsakos, “Controlled gliding and perching through deep-reinforcement-learning,” Phys. Rev. Fluids, vol. 4, iss. 9, 2019.
    [BibTeX] [PDF] [DOI]
    author = {Guido Novati and L. Mahadevan and Petros Koumoutsakos},
    doi = {10.1103/physrevfluids.4.093902},
    journal = {{Phys. Rev. Fluids}},
    month = {sep},
    number = {9},
    publisher = {American Physical Society ({APS})},
    title = {Controlled gliding and perching through deep-reinforcement-learning},
    url = {},
    volume = {4},
    year = {2019}

  • G. Novati and P. Koumoutsakos, “Remember and forget for experience replay,” in Proceedings of the 36th international conference on machine learning, 2019.
    [BibTeX] [PDF]
    author = {Novati, Guido and Koumoutsakos, Petros},
    booktitle = {Proceedings of the 36th International Conference on Machine Learning},
    title = {Remember and Forget for Experience Replay},
    url = {},
    year = {2019}


  • G. Novati, S. Verma, D. Alexeev, D. Rossinelli, W. M. van Rees, and P. Koumoutsakos, “Synchronisation through learning for two self-propelled swimmers,” Bioinspir. Biomim., vol. 12, iss. 3, p. 36001, 2017.
    [BibTeX] [PDF] [DOI]
    author = {Guido Novati and Siddhartha Verma and Dmitry Alexeev and Diego Rossinelli and Wim M van Rees and Petros Koumoutsakos},
    doi = {10.1088/1748-3190/aa6311},
    journal = {{Bioinspir. Biomim.}},
    month = {mar},
    number = {3},
    pages = {036001},
    publisher = {{IOP} Publishing},
    title = {Synchronisation through learning for two self-propelled swimmers},
    url = {},
    volume = {12},
    year = {2017}