SFENet: A Real-Time Semantic Segmentation Network Based on Selective State Models and Frequency-Edge Enhancement

Kaiyu Zhou

doi:10.54097/0cd7g824

Authors

Kaiyu Zhou School of Software, Henan Polytechnic University, 2001 Century Avenue, Jiaozuo 454000, China

DOI:

https://doi.org/10.54097/0cd7g824

Keywords:

Real-Time Semantic Segmentation, Selective Scanning, Edge Enhancement

Abstract

Existing real-time semantic segmentation models have achieved notable progress. However, they still suffer from feature redundancy and insufficient capability in capturing spatial details. To address these issues, this paper proposes a real-time semantic segmentation network based on selective state models and frequency-domain spatial detail enhancement. Specifically, a Selective Redundancy Suppression Module (SRSM) is designed based on a selective scanning mechanism. This module retains informative features while suppressing redundant ones through selective scanning. In addition, a Frequency Edge Enhancement Module (FEEM) is constructed. It combines the Fast Fourier Transform (FFT) with an attention mechanism to enhance high-frequency edge information. SFENet is evaluated on two datasets. On the Cityscapes dataset, it achieves 77.3% mIoU, outperforming the baseline by 1.4% mIoU. On the CamVid dataset, it achieves 77.1% mIoU. Experimental results demonstrate that SFENet effectively preserves critical semantic features and accurately segments object boundaries.

Downloads

Download data is not yet available.

References

[1] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440. DOI: https://doi.org/10.1109/CVPR.2015.7298965

[2] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495. DOI: https://doi.org/10.1109/TPAMI.2016.2644615

[3] Chen L C. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv preprint arXiv:1412.7062, 2014.

[4] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848. DOI: https://doi.org/10.1109/TPAMI.2017.2699184

[5] Chen L C. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017.

[6] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 801-818. DOI: https://doi.org/10.1007/978-3-030-01234-2_49

[7] Lin G, Milan A, Shen C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1925-1934. DOI: https://doi.org/10.1109/CVPR.2017.549

[8] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890. DOI: https://doi.org/10.1109/CVPR.2017.660

[9] Paszke A, Chaurasia A, Kim S, et al. Enet: A deep neural network architecture for real-time semantic segmentation[J]. arXiv preprint arXiv:1606.02147, 2016.

[10] Zhao H, Qi X, Shen X, et al. Icnet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European conference on computer vision (ECCV).

[11] Wu T, Tang S, Zhang R, et al. Cgnet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2020, 30: 1169-1179. DOI: https://doi.org/10.1109/TIP.2020.3042065

[12] Yu C, Wang J, Peng C,etal.Bisenet:Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 325-341. DOI: https://doi.org/10.1007/978-3-030-01261-8_20

[13] Fan M, Lai S, Huang J, et al. Rethinking bisenet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 9716-9725. DOI: https://doi.org/10.1109/CVPR46437.2021.00959

[14] Hong Y, Pan H, Sun W, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[J]. arxiv preprint arxiv:2101.06085, 2021.

[15] Yang G, Wang Y, Shi D. Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation[J]. arxiv preprint arxiv:2406.12496, 2024. DOI: https://doi.org/10.2139/ssrn.5248071

[16] XU J, XIONG Z, BHATTACHARYYA S P.PIDNet: A real-time semantic segmentation network inspired by PID controllers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 19529-19539. DOI: https://doi.org/10.1109/CVPR52729.2023.01871

[17] Liu Y, Tian Y, Zhao Y, et al. Vmamba: Visual state space model[J]. Advances in neural information processing systems, 2024, 37: 103031-103063. DOI: https://doi.org/10.52202/079017-3273

[18] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213– 3223. DOI: https://doi.org/10.1109/CVPR.2016.350

[19] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-definition ground truth database,” Pattern recognition letters, vol. 30, no. 2, pp. 88–97, 2009. DOI: https://doi.org/10.1016/j.patrec.2008.04.005

[20] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Efficient residual factorized convnet for real-time semantic segmentation,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263–272, 2017. DOI: https://doi.org/10.1109/TITS.2017.2750080

[21] R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: Fast semantic segmentation network,” arXiv preprint arXiv: 1902. 04502, 2019.

[22] H. Si, Z. Zhang, F. Lv, G. Yu, and F. Lu, “Real-time semantic segmentation via multiply spatial fusion network,” arXiv preprint arXiv:1911.07217, 2019. DOI: https://doi.org/10.5244/C.34.153

[23] M. Orsic, I. Kreso, P. Bevandic, and S. Segvic, “In defense of pretrained imagenet architectures for real-time semantic segmentation of road-driving images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 607–12 616. DOI: https://doi.org/10.1109/CVPR.2019.01289

[24] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” International Journal of Computer Vision, vol. 129, pp. 3051–3068, 2021. DOI: https://doi.org/10.1007/s11263-021-01515-2

[25] Y. Nirkin, L. Wolf, and T. Hassner, “Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4061–4070. DOI: https://doi.org/10.1109/CVPR46437.2021.00405

[26] J. Peng, Y. Liu, S. Tang, Y. Hao, L. Chu, G. Chen, Z. Wu, Z. Chen, Z. Yu, Y. Du et al., “Pp-liteseg: A superior real-time semantic segmentation model,” arXiv preprint arXiv:2204.02681, 2022.

SFENet: A Real-Time Semantic Segmentation Network Based on Selective State Models and Frequency-Edge Enhancement

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing

Keywords

Latest publications