Unspecialized Belief Learning for Zero-Shot Coordination

doi:10.21203/rs.3.rs-1885266/v1

Download PDF

Research Article

Unspecialized Belief Learning for Zero-Shot Coordination

https://doi.org/10.21203/rs.3.rs-1885266/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

An important and challenging problem setting in decentralized partially observable Markov decision processes is zero-shot coordination (ZSC), where an AI agent must coordinate with novel partners without prior coordination. However, agents trained by standard multi-agent reinforcement learning algorithms in self-play setting study policies by playing the game themselves and thus cannot coordinate on the policies when paired with humans or other novel agents. Despite that some recent works have attributed the failure to highly specialized conventions, the lack of the explanation of how the conventions are adopted makes the viewpoint unconvincing. To explain this, we first construct a model to describe the formation of specialized conventions. Then, we present unspecialized belief learning (UBL), which enhances controlling the specialization of belief. UBL can reduce specialized conventions and shows significant performance in a simple toy-setting and the benchmark problem Hanabi. Finally, we discuss the future work of ZSC, including human-AI teaming and interpretability of AI.

Reinforcement Learning

Zero-Shot Coordination

Cooperative AI

Hanabi

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Unspecialized Belief Learning for Zero-Shot Coordination

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1