An important and challenging problem setting in decentralized partially observable Markov decision processes is zero-shot coordination (ZSC), where an AI agent must coordinate with novel partners without prior coordination. However, agents trained by standard multi-agent reinforcement learning algorithms in self-play setting study policies by playing the game themselves and thus cannot coordinate on the policies when paired with humans or other novel agents. Despite that some recent works have attributed the failure to highly specialized conventions, the lack of the explanation of how the conventions are adopted makes the viewpoint unconvincing. To explain this, we first construct a model to describe the formation of specialized conventions. Then, we present unspecialized belief learning (UBL), which enhances controlling the specialization of belief. UBL can reduce specialized conventions and shows significant performance in a simple toy-setting and the benchmark problem Hanabi. Finally, we discuss the future work of ZSC, including human-AI teaming and interpretability of AI.