Deep Reinforcement Learning for Mahjong

BA Dissertation supervised by Dr Sean Holden

Xiangyu Zhao, Sean B. Holden

14 May 2021

Mahjong is a popular multi-player imperfect-information game developed in China in the late 19th-century, with some very challenging features for AI research. Sanma, being a 3-player variant of the Japanese Riichi Mahjong, possesses unique characteristics including fewer tiles and, consequently, a more aggressive playing style. It is thus challenging and of great research interest in its own right, but has not yet been explored. In this project, I built Meowjong, an AI for Sanma using deep reinforcement learning. I defined an informative and compact 2-dimensional data structure for encoding the observable information in a Sanma game. Then, I pre-trained 5 convolutional neural networks (CNNs) for Sanma’s 5 actions—discard, Pon, Kan, Kita and Riichi, and enhanced the major action’s model, namely the discard model, via self-play reinforcement learning using the Monte Carlo policy gradient method. I have also implemeted necessary additional modules including data collection and processing, Mahjong hand calculation and a game simulator, for the agents to be trained and evaluated. Meowjong’s models have achieved test accuracies comparable with AIs for 4-player Mahjong through supervised learning, and gained a significant further enhancement from reinforcement learning. Being the first ever AI in Sanma, Meowjong stands as a state-of-the-art in this game.

A paper from this dissertation has been published at the 2022 IEEE Conference on Games (CoG).