An important area of study in the field of Reinforcement Learning (RL) is Transfer Learning where the aim is to leverage previous experiences to accelerate learning in a new unseen tasks. While it is clear that living organisms apply transfer learning throughout their lives, it is often unclear how this transfer mechanism exhibited by living organisms can be incorporated into autonomous agents
As a concrete example, consider a simple Sokoban task as below.
Once a human completes this task they have learned core concepts about the underlying structure of the Sokoban domain. For example, they would learn that the warehouse-keeper:
- cannot walk through walls,
- cannot push a box that is adjacent to another box
- cannot push a box that is adjacent to a wall
Now suppose the human would be given this new more complex task to solve.
Clearly, the human would re-use the rules they had previously learned in the simple task to gain an advantage in this new task. Unfortunately, most state-of-the art RL algorithms would not leverage such knowledge and would instead re-learn everything from scratch on the more complex task. Such wastefulness of prior experience is clearly inefficient!
One idea that has shown promise in transfer learning is the notion of object-oriented representation. With this approach we view a task as being instances of objects classes. For example, any Sokoban task can be thought of as made objects that are instances...
DRM-connect is an algorithm for motion planning and replanning, and is a combination of dynamic reachability maps (DRM) with lazy collision checking and a fallback strategy based on the RRT-connect algorithm, which is used to repair the roadmap through further exploration.
Trajectory planning and replanning in complex environments often reuses very little information from previous solutions. This is particularly evident when the motion is repeated multiple times with only a limited amount of variation between each run. Graph-based planning offers fast replanning at the cost of significant pre-computation, while probabilistic planning requires no pre-computation at the cost of slow replanning.
We attempt to offer the best of both by proposing the DRM-connect algorithm.
Offline, an approximate Reeb graph is constructed from the trajectories of prior tasks in the same or similar environments.
For a new planning or replanning query, DRM-connect searches this Reeb graph for a trajectory to complete the task (checking collisions lazily). If no path is found, DRM-connect iterates between attempting to repair the disconnected subgraphs through a process similar to RRT-connect (operating on multiple graphs, rather than trees) and searching for paths through the graph. Since DRM-connect is probabilistically complete, the likelihood of a successful trajectory being returned approaches one as time tends to infinity.
Further work will incorporate online updates...
A big congratulations to Roy Eyono whose poster, Learning to Backpropagate, was awarded the IBM Poster Prize at the 2018 Deep Learning Indaba!
We are co-organising the 2nd annual Deep Learning Indaba which will be held at Stellenbosch University from 9-14 September 2018.
The Deep Learning Indaba exists to celebrate and strengthen machine learning in Africa through state-of-the-art teaching, networking, policy debate, and through our support programmes, such as the IndabaX and the Kambule and Maathai awards. The Indaba works towards the vision of Africans becoming critical contributors, owners, and shapers of the coming advances in artificial intelligence and machine learning. The report on the outcomes of the first Indaba 2017 can be read here.
Congratulations to Steven James, who was one of four people to be awarded a Google Africa PhD Fellowship, and the only one in the "Machine Learning" category.
Congratulations to Benjamin Rosman, who was granted a Google Faculty Research Award in the "Machine Learning and Data Mining" category - the only recipient from the African continent.