Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Abstract

While reinforcement learning has achieved remarkable successes in several domains, its real-world application is limited due to many methods failing to generalise to unfamiliar conditions. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the environment’s response to the agent’s actions differs. For example, the gravitational force exerted on a robot depends on its mass and changes the robot’s mobility. Consequently, in such cases, it is necessary to condition an agent’s actions on extrinsic state information and pertinent contextual information reflecting how the environment responds. While the need for context-sensitive policies has been established, the manner in which context is incorporated architecturally has received less attention. Thus, in this work, we present an investigation into how context information should be incorporated into behaviour learning to improve generalisation. To this end, we introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. Beyond this, the Decision Adapter is more robust to irrelevant distractor variables than several alternative methods.

Publication
In Advances in Neural Information Processing Systems
Michael Beukman
Michael Beukman

I like doing cool things, such as generating levels in Minecraft, teaching robots how to kick a ball and I do rock climbing in my spare time.

Devon Jarvis
Devon Jarvis
Associate Lecturer

I am a PhD candidate and Associate Lecturer at Wits interested in studying systematic generalization and the emergence of modularity in the brain and machines.

Richard Klein
Richard Klein
PRIME Lab Director

I am an Associate Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg, and a co-PI of the PRIME lab.

Steven James
Steven James
Deputy Lab Director

My research interests include reinforcement learning and planning.

Benjamin Rosman
Benjamin Rosman
Lab Director

I am a Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg. I work in robotics, artificial intelligence, decision theory and machine learning.