MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Simon Rosen, Siddarth Singh, Ebenezer Gelo, Helen Sarah Robertson, Ibrahim Suder, Victoria Williams, Benjamin Rosman, Geraud Nangue Tasse, Steven James

May 2026

Abstract

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ranked deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

Type

Conference paper

Publication

In International Conference on Autonomous Agents and Multiagent Systems

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Abstract

Simon Rosen

Siddarth Singh

Ebenezer Gelo

Ibrahim Suder

Victoria Williams

Postdoctoral Researcher

Benjamin Rosman

Lab Director

Geraud Nangue Tasse

Lecturer

Steven James

Deputy Lab Director