MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Abstract

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ranked deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

Publication
In International Conference on Autonomous Agents and Multiagent Systems
Simon Rosen
Simon Rosen

I am an experienced software developer and a master’s candidate. My research is focused on sample efficiency in cooperative multiagent reinforcement learning by leveraging knowledge transfer.

Ebenezer	Gelo
Ebenezer Gelo

My research interests include batch reinforcement learning and knowledge transfer. When not exploring the frontiers of AI/ML, you can find me on League of Legends (making questionable plays).

Ibrahim Suder
Ibrahim Suder

I have an avid interest in intelligent robotics and more broadly, the pursuit of generally intelligent systems.

Victoria Williams
Victoria Williams
Postdoctoral Researcher

I have an interdisciplinary background in the Neurosciences (MA Anthropology, PhD Neuroanatomy, Post doctorate in Psychology), which provides me with a holistic understanding of the human brain, from anatomy, behaviour to evolution. I am interested in integrating this knowledge into my Postdoctorate in Computer Sciences.

Benjamin Rosman
Benjamin Rosman
Lab Director

I am a Professor in the School of Computer Science and Applied Mathematics at the University of the Witwatersrand in Johannesburg. I work in robotics, artificial intelligence, decision theory and machine learning.

Geraud Nangue Tasse
Geraud Nangue Tasse
Lecturer

I am interested in reinforcement learning (RL) since it is the subfield of machine learning with the most potential for achieving AGI.

Steven James
Steven James
Deputy Lab Director

My research interests include reinforcement learning and planning.