Learning-based techniques could be an alternative approach to solve Dynamic Distributed Constraint Optimization Problems (DDCOPs) and are computationally cheaper than sequential DCOP solvers. This paper, proposes a learning-based solution to solve DDCOPs in which the environment is stochastic due to the presence of multiple agents. In our approach the problem is modelled as a multi-agent Markov Decision Process and then a learning automaton, which is a relatively simple method and requires less qualitative data, is employed to learn how to assign values to variables. The proposed method considers two very important issues namely time step dependency and uncertainty about future events upon which we allocate values to variables. Experimental results reveal that the employed method converges and satisfies the constraints of the optimization problems in comparison to the well-known methods.