Next-generation networks, such as those beyond 5th generation (B5G) and 6th generation (6G) generation, have diverse network resource demands that imply significant changes in the architecture and infrastructure of telecommunication networks. Network slicing (NS) is considered a candidate technology, where a single network infrastructure is divided into (virtual) multiple slices to meet different service requirements. The idea of combining device-to-device (D2D) and NS can improve spectrum utilization, thereby providing better performance and scalability. This paper addresses the challenging problem of dynamic resource allocation in wireless network slices with D2D using deep reinforcement learning (DRL) techniques. More specifically, we propose an approach named DDPG-KRP based on deep deterministic policy gradient (DDPG) with K-Nearest Neighbors (KNN) and Reward Penalization (RP) for undesirable action elimination to determine the resource allocation policy maximizing long-term rewards. The simulation results show that the DDPG-KRP is a promising solution for resource allocation in next-generation wireless networks, outperforming other considered DRL algorithms.