SFENet: A Real-Time Semantic Segmentation Network Based on Selective State Models and Frequency-Edge Enhancement
DOI:
https://doi.org/10.54097/0cd7g824Keywords:
Real-Time Semantic Segmentation, Selective Scanning, Edge EnhancementAbstract
Existing real-time semantic segmentation models have achieved notable progress. However, they still suffer from feature redundancy and insufficient capability in capturing spatial details. To address these issues, this paper proposes a real-time semantic segmentation network based on selective state models and frequency-domain spatial detail enhancement. Specifically, a Selective Redundancy Suppression Module (SRSM) is designed based on a selective scanning mechanism. This module retains informative features while suppressing redundant ones through selective scanning. In addition, a Frequency Edge Enhancement Module (FEEM) is constructed. It combines the Fast Fourier Transform (FFT) with an attention mechanism to enhance high-frequency edge information. SFENet is evaluated on two datasets. On the Cityscapes dataset, it achieves 77.3% mIoU, outperforming the baseline by 1.4% mIoU. On the CamVid dataset, it achieves 77.1% mIoU. Experimental results demonstrate that SFENet effectively preserves critical semantic features and accurately segments object boundaries.
Downloads
References
[1] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440. DOI: https://doi.org/10.1109/CVPR.2015.7298965
[2] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495. DOI: https://doi.org/10.1109/TPAMI.2016.2644615
[3] Chen L C. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv preprint arXiv:1412.7062, 2014.
[4] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848. DOI: https://doi.org/10.1109/TPAMI.2017.2699184
[5] Chen L C. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017.
[6] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 801-818. DOI: https://doi.org/10.1007/978-3-030-01234-2_49
[7] Lin G, Milan A, Shen C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1925-1934. DOI: https://doi.org/10.1109/CVPR.2017.549
[8] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890. DOI: https://doi.org/10.1109/CVPR.2017.660
[9] Paszke A, Chaurasia A, Kim S, et al. Enet: A deep neural network architecture for real-time semantic segmentation[J]. arXiv preprint arXiv:1606.02147, 2016.
[10] Zhao H, Qi X, Shen X, et al. Icnet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European conference on computer vision (ECCV).
[11] Wu T, Tang S, Zhang R, et al. Cgnet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2020, 30: 1169-1179. DOI: https://doi.org/10.1109/TIP.2020.3042065
[12] Yu C, Wang J, Peng C,etal.Bisenet:Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 325-341. DOI: https://doi.org/10.1007/978-3-030-01261-8_20
[13] Fan M, Lai S, Huang J, et al. Rethinking bisenet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 9716-9725. DOI: https://doi.org/10.1109/CVPR46437.2021.00959
[14] Hong Y, Pan H, Sun W, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[J]. arxiv preprint arxiv:2101.06085, 2021.
[15] Yang G, Wang Y, Shi D. Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation[J]. arxiv preprint arxiv:2406.12496, 2024. DOI: https://doi.org/10.2139/ssrn.5248071
[16] XU J, XIONG Z, BHATTACHARYYA S P.PIDNet: A real-time semantic segmentation network inspired by PID controllers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 19529-19539. DOI: https://doi.org/10.1109/CVPR52729.2023.01871
[17] Liu Y, Tian Y, Zhao Y, et al. Vmamba: Visual state space model[J]. Advances in neural information processing systems, 2024, 37: 103031-103063. DOI: https://doi.org/10.52202/079017-3273
[18] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213– 3223. DOI: https://doi.org/10.1109/CVPR.2016.350
[19] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-definition ground truth database,” Pattern recognition letters, vol. 30, no. 2, pp. 88–97, 2009. DOI: https://doi.org/10.1016/j.patrec.2008.04.005
[20] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Efficient residual factorized convnet for real-time semantic segmentation,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263–272, 2017. DOI: https://doi.org/10.1109/TITS.2017.2750080
[21] R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: Fast semantic segmentation network,” arXiv preprint arXiv: 1902. 04502, 2019.
[22] H. Si, Z. Zhang, F. Lv, G. Yu, and F. Lu, “Real-time semantic segmentation via multiply spatial fusion network,” arXiv preprint arXiv:1911.07217, 2019. DOI: https://doi.org/10.5244/C.34.153
[23] M. Orsic, I. Kreso, P. Bevandic, and S. Segvic, “In defense of pretrained imagenet architectures for real-time semantic segmentation of road-driving images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 607–12 616. DOI: https://doi.org/10.1109/CVPR.2019.01289
[24] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” International Journal of Computer Vision, vol. 129, pp. 3051–3068, 2021. DOI: https://doi.org/10.1007/s11263-021-01515-2
[25] Y. Nirkin, L. Wolf, and T. Hassner, “Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4061–4070. DOI: https://doi.org/10.1109/CVPR46437.2021.00405
[26] J. Peng, Y. Liu, S. Tang, Y. Hao, L. Chu, G. Chen, Z. Wu, Z. Chen, Z. Yu, Y. Du et al., “Pp-liteseg: A superior real-time semantic segmentation model,” arXiv preprint arXiv:2204.02681, 2022.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Academic Journal of Applied Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.










