To what extent is Proximal Policy Optimisation a more efficient and accurate reinforcement learning algorithm in procedurally generated environments?