[edit]
Exploring Verification Frameworks for Social Choice Alignment
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:439-446, 2025.
Abstract
The deployment of autonomous agents that interact with humans in safety-critical situations raises new research problems as we move towards fully autonomous systems in domains such as autonomous vehicles or search and rescue. If autonomous agents are placed in a dilemma, how would they act? The literature in computational ethics has explored the actions and learning methods that emerge in ethical dilemmas. However, our position paper examines how ethical dilemmas are not isolated in a social vacuum. Our central claim in our position paper is that to enable trust among all human users, a neuralsymbolic verification of moral preference alignment is required. We propose that the formal robustness properties be applied to social choice modelling. We outline how robustness properties can help validate the formation of stable social preference clusters in deep neural network classifiers. Our initial results highlight the vulnerabilities of models in moral-critical scenarios to perturbations, suggesting a verification-training loop for improved robustness. We position this work as an inquiry into the viability of verifying moral preference alignment, based on our initial results. Ultimately, we aim to contribute to the broader interdisciplinary effort that integrates formal methods, social choice theory, and empirical moral psychology for interpretable computational ethics.