Model Predictive-Actor Critic Reinforcement Learning for Dexterous Manipulation

Abstract

Dexterous multi-fingered robotic hands represent a promising solution for robotic manipulators to perform a wide range of complex tasks, through acquiring more general purpose skills. Nevertheless, developing complex behaviours for a robot needs sophisticated control strategies, which requires domain expertise with a good understanding of mathematics, and underlying physics, this will be very difficult for such complex robots. Learning algorithms like deep reinforcement learning provide a general framework for robots to learn complex behaviours directly from data. But, even for learning algorithms, dexterous manipulators represent a major challenge due to their high dimensionality and contact rich under-actuated object manipulation. To overcome these challenges, learning algorithms demand extensive data generation, which is often expensive or hard to obtain. So, the need for more efficient and more effective algorithms -that extract more useful information from the least amount of available data- has increased. This paper presents a contribution which is the development of the algorithm Model Predictive Control-Soft Actor Critic (MPC-SAC), an algorithm that combines offline learning with online planning to develop a control policy. The algorithm is benchmarked on two challenging complex dexterous manipulation tasks, against other state of the art model free reinforcement learning algorithms. Finally, it was found that the algorithm achieves asymptotic performance with state of the art data efficiency in both tasks.

Publication
International Conference on Computer, Control, Electrical, and Electronics Engineering
Benjamin Rosman
Benjamin Rosman
Lab Director

I am a Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg. I work in robotics, artificial intelligence, decision theory and machine learning.