Using Simple Deontic Constraints for Fast Norm-Conforming Reinforcement Learning

2025

Conference: \em The 17th International Conference on Deontic Logic and normative systems

Matthias Scheutz and Daniel Little

Standard reinforcement learning (RL) methods discover policies that maximize a reward signal but cannot learn normative behavior quickly. We propose a novel approach that uses expert demonstrations to generate simple constraints using deontic operators, guiding the agent’s decision-making process. The agent uses those demonstrations to learn which actions may be obligated or permitted in certain states. By forcing the agent to take actions it learns as obligated, we significantly reduce the state space complexity of the learning problem. Furthermore, we show how after learning low-level obligated actions, the agent can cluster the received demonstrations and analyze commonly-occurring subsequences, allowing the agent to learn higher-level obligations. We demonstrate how our method learns faster and commits no norm violations in a hybrid-planning supermarket shopping task.

@inproceedings{scheutzlittle25deon,
  title={Using Simple Deontic Constraints for Fast Norm-Conforming Reinforcement Learning},
  author={Matthias Scheutz and Daniel Little},
  year={2025},
  booktitle={\em The 17th International Conference on Deontic Logic and normative systems},
  url={https://hrilab.tufts.edu/publications/scheutzlittle25deon.pdf}
}