Flow control has attracted research for the positive outcomes in engineering systems applications and is finding renewed interest thanks to the rise of data-driven modeling and control algorithms. Interestingly, due to the nonlinearities and the strong convective nature leading to time delays, fluid systems can also serve as challenging test-bed for the further development of control algorithms; indeed, despite the huge potential, the limitations introduced by the plant represent a challenge and the inherent complexity of the dynamics at play requires appropriate strategies during the training process. In this contribution, we consider a reinforcement learning framework and introduce a well-suited actor-critic algorithm to tackle these challenges. The presented algorithm is i) data-parsimonious, ii) leverages optimistic order in the value iteration and iii) is equipped with policy improvement-based error bounds. These features allow a speed-up of the convergence and provide a theoretical-based stopping criterion. As test cases, we consider a linearized version of the Kuramoto-Sivashinsky equation and the control of instabilities in a two-dimensional boundary-layer flow developing over a flat plate, by introducing a minimal number of sensors and actuators. Concerning analogous works that appeared in literature, we show that keeping a rather classical plant is sufficient for guaranteeing adequate performance if the input-output dynamics as well as the observability properties are taken into account.