An important area of study in the field of Reinforcement Learning (RL) is Transfer Learning where the aim is to leverage previous experiences to accelerate learning in a new unseen tasks. While it is clear that living organisms apply transfer learning throughout their lives, it is often unclear how this transfer mechanism exhibited by living organisms can be incorporated into autonomous agents
As a concrete example, consider a simple Sokoban task as below.
Once a human completes this task they have learned core concepts about the underlying structure of the Sokoban domain. For example, they would learn that the warehouse-keeper:
- cannot walk through walls,
- cannot push a box that is adjacent to another box
- cannot push a box that is adjacent to a wall
Now suppose the human would be given this new more complex task to solve.
Clearly, the human would re-use the rules they had previously learned in the simple task to gain an advantage in this new task. Unfortunately, most state-of-the art RL algorithms would not leverage such knowledge and would instead re-learn everything from scratch on the more complex task. Such wastefulness of prior experience is clearly inefficient!
One idea that has shown promise in transfer learning is the notion of object-oriented representation. With this approach we view a task as being instances of objects classes. For example, any Sokoban task can be thought of as made objects that are instances of only four object classes.
Once we have grounded a task with instances of object classes the aim of the object-oriented approach is to learn conjunctions over logical statements that map to effects over attributes. The conjunctions will represent the rules for the domain. For example, we may learn a rule that says: whenever the warehouse-keeper takes a step east and there is any wall object one square east of the warehouse-keeper, then the warehouse-keeper's x attribute does not change. Clearly such a rule is transferable and can be used between Sokoban tasks, which is exactly what we wanted to achieve.
Limitations of Previous Object-Oriented Approaches
Previous work on object-oriented representation introduced a framework that was fully propositional. Under such a framework, in order to learn compact and transferable rules, you must restrict your logical statements so that they do not refer to any grounded objects. Unfortunately, this restriction limits the expressive power of the framework.
For example, in the Sokoban domain there is no way to represent the rules that apply to a box object's x attribute compactly with propositional statements. We provide formal reasoning in our paper, but on an high-level the reason is that there is an interactive effect between the warehouse-keeper, box and wall that cannot be resolved unless we ground the box object in question.
Resolving these Limitations with Deictic Object-Oriented Representation
In our work we have introduced the notion of Deictic Object-Oriented Representation. The key idea for our framework is that we allow for the grounding of only a single object in the logical statements - the object for which we are predicting the effect.
By incorporating this reference object, we can resolve the ambiguity that existed with the propositional approach and completely learn the rules for a richer set of domains i.e. the Sokoban domain. In fact, as we illustrate in our paper, by using the Deictic approach we can completely and efficiently learn the rules of the entire Sokoban domain from the simple Sokoban task shown above and then zero-shot transfer these rules to the more complex Sokoban task. In fact, the more complex task has one-million states and requires 209 steps to solve under an optimal policy and so learning an optimal policy from scratch would be intractable under most RL approaches.
For more details please watch the video below or read the paper.