Benchmarks

Mujoco game

  DDPG TRPO PPO TD3
Ant-v2   609.61 969.08 1769.52
HalfCheetah-v2   667.06 2607.94 6108.17
Hopper-v2   1460.93 2100.74 2515.44
Humanoid-v2   339.35 458.59 278.14
HumanoidStandup-v2   58715.82 81282.21 84551.70 (1M: 90523.85)
InvertedDoublePendulum-v2   8131.25 6606.13 8342.53 (1M: 8925.03)
InvertedPendulum-v2   900.08 943.31 940.33 (1M: 972.17)
Reacher-v2 -13.96 -10.08 -10.34 -9.94
Swimmer-v2   38.05 44.17 43.55
Walker2d-v2   493.14 (1M: 1373.30) 1138.25 1008.70 (1M: 3394.72)

Performance on 500,000 sample steps.