Hybrid beamforming has received significant attention as a solution to the thermal issues, costs, and implementation complexities associated with fully digital mmWave Extremely Large MIMO (XL-MIMO) systems. The Hybrid approach offers a balance between system flexibility and performance. However, even with optimal beamforming techniques, having a large number of User Equipments (UEs), if the data rate is not effectively shared among them, the overall network delay and throughput will remain unsatisfactory. Considering multiple temporal characteristics of an indoor wireless channel, we propose a multi-layer scheme to optimize resource allocation to the UEs in a cooperative multi-Base Station (BS) setup. We studied the performance of different resource allocation techniques to share the bandwidth effectively among UEs, rather than merely maximizing the total downlink bitrate. Our findings indicate that queue length-based allocation strategies yield the lowest delay under different downlink traffics and UE/BS deployment densities. Moreover, instead of serving all UEs simultaneously in each channel block, selectively serving a subset of UEs at each interval significantly enhances network delay and throughput. To further improve resource utilization and overall performance, we introduce an extension called sub-coherence time allocation. This technique considers early downlink queue exhaustion and speculatively calculates multiple digital precoders to be activated sequentially within a channel block. Simulation results demonstrate that this approach improves delay and jitter with minimal computational overhead, achieving a 25% and 15% decrease in digital and hybrid modes respectively.