We address in this paper how complex and dynamic environments characterized by variable limited capacities and subject to transient unavailability may pose significant challenges for Deep Reinforcement Learning (DRL) agent. The investigation concerns the context of Service Function Chaining (SFC) orchestration problem in Software-Defined Networking (SDN) and Network Function Virtualization (NFV) based environments using DRL approach, implemented through Deep Q-Network (DQN), and aiming to maximize Quality of Experience (QoE) while meeting Quality of Service (QoS) constraints. We show through numerical results how limited capacity in Physical Substrate Network (PSN) complicates the training process in terms of finding suitable compromise between performance and convergence. We highlight also how replay buffer may mitigate transient unavailability of PSN nodes and what are the limits of such a solution when the unavailability becomes more prolonged in time or more severe (simultaneous unavailability of more than one node).