The occurrence of forest fires can lead to ecological damage, property loss, and human casualties. Current forest fire smoke detection methods do not sufficiently consider the characteristics of smoke with high transparency and no clear edges and have low detection accuracy, which cannot meet the needs of complex aerial forest fire smoke detection tasks. In this paper, we propose Dual-ResNet50-vd with SoftPool based on a recursive feature pyramid with deconvolution and dilated convolution and global optimal nonmaximum suppression (DRGNet) for high-accuracy detection of forest fire smoke. First, the Dual-ResNet50-vd module is proposed to enhance the extraction of smoke features with high transparency and no clear edges, and SoftPool is used to retain more feature information of smoke. Then, a recursive feature pyramid with deconvolution and dilated convolution (RDDFPN) is proposed to fuse shallow visual features and deep semantic information in the channel dimension to improve the accuracy of long-range aerial smoke detection. Finally, global optimal nonmaximum suppression (GO-NMS) sets the objective function to globally optimize the selection of anchor frames to adapt to the aerial photography of multiple smoke locations in forest fire scenes. The experimental results show that the DRGNet parametric number on the UAV-IoT platform is as low as 53.48 M, mAP reaches 79.03%, mAP50 reaches 90.26%, mAP75 reaches 82.35%, FPS reaches 122.5, and GFLOPs reaches 55.78. Compared with other mainstream methods, it has the advantages of real-time detection and high accuracy.