The Challenge of Redundancy on Multi-agent Value Factorisation


In the field of cooperative multi-agent reinforcement learning (MARL), the standard paradigm is the use of centralised training and decentralised execution where a central critic conditions the policies of the cooperative agents based on a central state. It has been shown, that in cases with large numbers of redundant agents these methods become less effective. In a more general case, there is likely to be a larger number of agents in an environment than is required to solve the task. These redundant agents reduce performance by enlarging the dimensionality of both the state space and and increasing the size of the joint policy used to solve the environment. We propose leveraging layerwise relevance propagation (LRP) to instead separate the learning of the joint value function and generation of local reward signals and create a new MARL algorithm: relevance decomposition network (RDN). We find that although the performance of both baselines VDN and Qmix degrades with the number of redundant agents, RDN is unaffected.

International Conference on Autonomous Agents and Multiagent Systems
Benjamin Rosman
Benjamin Rosman
Lab Director

I am a Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg. I work in robotics, artificial intelligence, decision theory and machine learning.