Development of AlphaZero-based Reinforcment Learning Algorithm for Solving Partially Observable Markov Decision Process (POMDP) Problem

Tomoaki Kimura, Katsuyoshi Sakamoto, Tomah Sogabe


In recent years deep reinforcement learning (DRL) methods have advances rapidly, so DRL is applied in many fields. Most of DRL algorithms assume that the information from the environment is perfectly observed. However, in many real problems, the information from the environment is not fully observed. Such a problem is treated as a Partially Observable Markov Decision Processes (POMDPs). So, algorithms that solve POMDPs are important in applying DRL to real-world. In this paper, we apply AlphaZero, a deep reinforcement learning algorithm that achieve great performance in game, to POMDPs and show that the algorithm may be effective for POMDPs using a partially observable maze problem.


reinforcement learning; AlphaZero; POMDPs; maze problem

Full Text:



  • There are currently no refbacks.