Hierarchies of temporally decoupled policies present a promising approach for enabling structured exploration in complex long-term planning problems. To fully achieve this approach an end-to-end training paradigm is needed. However, training these multi-level policies has had limited success due to challenges arising from interactions between the goal-assigning and goal-achieving levels within a hierarchy. In this article, we consider the policy optimization process as a multi-agent process. This allows us to draw on connections between communication and cooperation in multi-agent RL, and demonstrate the benefits of increased cooperation between sub-policies on the training performance of the overall policy. We introduce a simple yet effective technique for inducing inter-level cooperation by modifying the objective function and subsequent gradients of higher-level policies. Experimental results on a wide variety of simulated robotics and traffic control tasks demonstrate that inducing cooperation results in stronger performing policies and increased sample efficiency on a set of difficult long time horizon tasks. We also find that goal-conditioned policies trained using our method display better transfer to new tasks, highlighting the benefits of our method in learning task-agnostic lower-level behaviors. Videos and code are available at: https://sites.google.com/berkeley.edu/cooperative-hrl.
Abstract:
Publication date:
November 17, 2021
Publication type:
Preprint
Citation:
Kreidieh, A. R., Berseth, G., Trabucco, B., Parajuli, S., Levine, S., & Bayen, A. M. (2021). Inter-Level Cooperation in Hierarchical Reinforcement Learning (arXiv:1912.02368). arXiv. https://doi.org/10.48550/arXiv.1912.02368