LRS-RT-DETR: A Long-Range Floating Garbage Detector with Multi-Scale Feature Fusion

Chenglong Lu; Xiangguo Sun

doi:10.54097/0d2gv996

Authors

Chenglong Lu School of Mechanical Engineering, Sichuan University of Science & Engineering, Yibin 644000, China
Xiangguo Sun School of Mechanical Engineering, Sichuan University of Science & Engineering, Yibin 644000, China

DOI:

https://doi.org/10.54097/0d2gv996

Keywords:

Water Surface Garbage Detection, Small Object Detection, Feature Pyramid Network, Multi-Scale Feature Fusion

Abstract

Real-time detection of floating water surface garbage is of great significance for water environment management. However, when the garbage target is more than 30 meters away from the camera, the target occupies only a small number of pixels in the image, and the complex water surface background also introduces detection interference; thus, general object detection methods struggle to achieve satisfactory performance. In this paper, we propose a long-range small object detection Transformer model named LRS-RT-DETR, which is improved based on the RT-DETR-R18 baseline, to adapt to the scenario of long-distance small garbage detection on water surfaces. First, we design a high-resolution feature fusion pyramid network called High-Resolution Feature Fusion Pyramid Network (HRFF-FPN). By employing the lossless Space-to-Depth (SPD) reorganization operation, we introduce the high-resolution P2 feature into the P3 detection node, significantly enhancing the model’s spatial perception capability for extremely small distant targets without adding extra detection heads. After receiving the fused features from the P2 layer, we propose a multi-scale kernel branch module (MSKB), which combines skip connections with a multi-scale parallel receptive field branch to achieve multi-level refined modeling of the fused features. Experimental results on the targeted long-distance water surface garbage dataset Far-water-surface Garbage Dataset (FWSGD) show that LRS-RT-DETR achieves 91.5% mAP@0.5, an improvement of 2.7 percentage points over the baseline RT-DETR-R18, and 43.9% mAP@0.5-0.95, an improvement of 1.5 percentage points. Meanwhile, the model parameter count increases by only 0.6M, with controllable computational overhead, demonstrating good potential for real-time deployment on edge devices.

Downloads

Download data is not yet available.

References

[1] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers [C]//Proceedings of the European Conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.

[2] Li N, Huang H, Wang X, et al. Detection of floating garbage on water surface based on PC-Net [J]. Sustainability, 2022, 14(18): 11729.

[3] Li N, Wang M, Yang G, et al. DENS-YOLOv6: A small object detection model for garbage detection on water surface [J]. Multimedia Tools and Applications, 2024, 83(18): 55751-55771.

[4] Liu C, Li J, Ke Z, et al. EMSH-DETR: An efficient multi-scale and hybrid DETR for floating garbage detection [J]. Measurement Science and Technology, 2026, 37(1): 015407.

[5] Zhao Y, Lv W, Xu S, et al. DETRs beat YOLOs on real-time object detection [C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 16965-16974.

[6] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2117-2125.

[7] Sunkara R, Luo T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects [C]//Joint European conference on machine learning and knowledge discovery in databases. Cham: Springer Nature Switzerland, 2022: 443-459.

[8] Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN [C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020: 390-391.

[9] Qin Z, Zhang P, Wu F, et al. FcaNet: Frequency channel attention networks [C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 783-792.

[10] Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module [C]//Proceedings of the European Conference on Computer Vision. 2018: 3-19.

[11] Zihan C, Hongyun Z, Duoqian M, et al. Multigranularity Dynamic Scene Image Deblurring Network Based on Deep Fusion of Frequency Domain and Spatial Domain Features [J]. Pattern Recognition and Artificial Intelligence, 2024, 37(6): 557-569.

[12] Cui Y, Ren W, Knoll A. Omni-kernel network for image restoration [C]//Proceedings of the AAAI conference on artificial intelligence. 2024, 38(2): 1426-1434.

[13] Yang Z, Guan Q, Zhao K, et al. Multi-branch auxiliary fusion YOLO with re-parameterization heterogeneous convolutional for accurate object detection [C]//Chinese conference on pattern recognition and computer vision (PRCV). Singapore: Springer Nature Singapore, 2024: 492-505.

[14] Li K, Geng Q, Wan M, et al. Context and spatial feature calibration for real-time semantic segmentation [J]. IEEE Transactions on Image Processing, 2023, 32: 5465-5477.

[15] Tang F, Xu Z, Huang Q, et al. DuAT: Dual-aggregation transformer network for medical image segmentation [C]//Chinese conference on pattern recognition and computer vision (PRCV). Singapore: Springer Nature Singapore, 2023: 343-356.

LRS-RT-DETR: A Long-Range Floating Garbage Detector with Multi-Scale Feature Fusion

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing

Keywords

Latest publications