Generalisation in Lifelong Reinforcement Learning through Logical Composition

Geraud Nangue Tasse, Steven James, Benjamin Rosman

December 2021

Abstract

We leverage logical composition in reinforcement learning to create a framework that enables an agent to autonomously determine whether a new task can be immediately solved using its existing abilities, or whether a task-specific skill should be learned. In the latter case, the proposed algorithm also enables the agent to learn the new task faster by generating an estimate of the optimal policy. Importantly, we provide two main theoretical results: we give bounds on the performance of the transferred policy on a new task, and we give bounds on the necessary and sufficient number of tasks that need to be learned throughout an agent’s lifetime to generalise over a distribution. We verify our approach in a series of experiments, where we perform transfer learning both after learning a set of base tasks, and after learning an arbitrary set of tasks. We also demonstrate that as a side effect of our transfer learning approach, an agent can produce an interpretable Boolean expression of its understanding of the current task. Finally, we demonstrate our approach in the full lifelong setting where an agent receives tasks from an unknown distribution and, starting from zero skills, is able to quickly generalise over the task distribution after learning only a few tasks—which are sub-logarithmic in the size of the task space.

Type

Workshop paper

Publication

NeurIPS Deep Reinforcement Learning Workshop

Generalisation in Lifelong Reinforcement Learning through Logical Composition

Abstract

Geraud Nangue Tasse

Lecturer

Steven James

Lab Director

Benjamin Rosman

Lab Director