Abstract:
Targets in drone aerial images are typically small in scale, densely distributed, and easily affected by complex backgrounds, motion blur, occlusion, and limited edge-device computing resources, making it difficult for detection algorithms to balance accuracy and efficiency. To address these challenges, this paper proposes an improved model named GC-YOLO based on the YOLO11 framework. First, a sampling module, DCAD, is introduced to fuse local and global information during the downsampling stage, enhancing fine-grained feature representation. Second, a GC-PAFPN feature pyramid is constructed by adding cross-scale fusion paths and incorporating the CSP-Omni-Kernel neural network model to improve multi-scale feature interaction. Finally, a Shape- and Semantic-Aligned Decoupled Head (SADH) is designed to reduce parameters and computational cost while improving the consistency between target classification and localization. Experimental results on the VisDrone2021 dataset show that the proposed GC-YOLO achieves mAP50 of 46.4% and mAP50–95 of 28.7%, representing improvements of 7.9 and 5.6 percentage points over YOLO11n (38.5% and 23.1%), respectively. The model has 4.02M parameters and 18.7 GFLOPs, demonstrating that it effectively improves aerial small-object detection performance while maintaining a lightweight design, providing a feasible solution for onboard drone target detection