Latent Cognitive Reinforcement for Anti-Backdoor Strategy

Yanwen Wang; Yanchang Liu; Xiaorou Zhang

doi:10.54097/bwmwmp38

Authors

Yanwen Wang School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing, Heilongjiang 163318, China
Yanchang Liu School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing, Heilongjiang 163318, China
Xiaorou Zhang School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing, Heilongjiang 163318, China

DOI:

https://doi.org/10.54097/bwmwmp38

Keywords:

Neural networks, Security, Backdoor attacks, Backdoor defense, Latent patterns

Abstract

With the widespread deployment of deep neural networks in sensitive applications, backdoor attacks have become a serious security concern. Such attacks enable adversaries to manipulate the prediction results without affecting the normal classification of clean samples by embedding hidden triggers in the model. Aiming at the limitations of existing defense methods in terms of adaptive attacks and generalization ability, this paper proposes the Latent Cognitive Reinforcement (LCR) backdoor defense strategy. The method utilizes a multi-layer feature representation of the intermediate and output layers to extract the cognitive patterns of the model, thereby identifying and eliminating potential backdoor trigger signals. Compared with traditional methods such as anomaly detection, neuron pruning, or adversarial training, LCR is more generalized and robust. Experiments on two datasets, CIFAR-10 and GTSRB, and five mainstream model architectures prove that LCR achieves an average of 91.80% in detection performance (AUROC), which significantly outperforms the existing state-of-the-art defense techniques and demonstrates good cross-model and cross-attack scenario adaptability.

Downloads

Download data is not yet available.

References

[1] HANIF A, SHAMSHAD F, AWAIS M, et al. BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning [EB/OL]. arXiv, 2024 [2025-05-05]. http://arxiv.org/abs/2408.07440.

[2] NASERI M, HAN Y, CRISTOFARO E D. BadVFL: Backdoor Attacks in Vertical Federated Learning [EB/OL]. arXiv, 2023 [2025-05-05]. http://arxiv.org/abs/2304.08847.

[3] WANG B, YAO Y, SHAN S, et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks [C/OL]//2019 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, 2019: 707-723 [2025-02-26]. https://ieeexplore.ieee.org/document/8835365/.

[4] LIU K, DOLAN-GAVITT B, GARG S. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks [EB/OL]. arXiv, 2018 [2025-02-26]. http://arxiv.org/abs/1805.12185.

[5] JEDDI A, SHAFIEE M J, WONG A. A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning [EB/OL]. arXiv, 2020 [2025-04-06]. http://arxiv.org/abs/2012.13628.

[6] KOH P W, STEINHARDT J, LIANG P. Stronger data poisoning attacks break data sanitization defenses [J]. Machine Learning, 2022, 111(1): 1-47.

[7] KRIZHEVSKY A. Learning Multiple Layers of Features from Tiny Images [J].

[8] HOUBEN S, STALLKAMP J, SALMEN J, et al. Detection of traffic signs in real-world images: The German traffic sign detection benchmark [C/OL]//The 2013 International Joint Conference on Neural Networks (IJCNN). Dallas, TX, USA: IEEE, 2013: 1-8 [2025-02-26]. http://ieeexplore.ieee.org/document/6706807/.

[9] GU T, DOLAN-GAVITT B, GARG S. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain [J/OL]. arXiv, 2017 [2023-11-19]. http://doc.paperpass.com/foreign/rgArti2017137662331.html.

[10] CHEN X, LIU C, LI B, et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning [J/OL]. 2017 [2023-11-19]. https://arxiv.org/abs/1712.05526.

[11] TURNER A, TSIPRAS D, MĄDRY A. Clean-Label Backdoor Attacks [J].

[12] CHENG S, LIU Y, MA S, et al. Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification [EB/OL]. arXiv, 2021 [2025-02-26]. http://arxiv.org/abs/2012.11212.

[13] NGUYEN T A, TRAN T A. Input-Aware Dynamic Backdoor Attack [J].

[14] SHAFAHI A, HUANG W R, NAJIBI M, et al. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks [J].

[15] BARNI M, KALLAS K, TONDI B. A new Backdoor Attack in CNNs by training set corruption without label poisoning [EB/OL]. arXiv, 2019 [2025-02-26]. http://arxiv.org/abs/1902.11237.

[16] LIU Y, MA S, AAFER Y, et al. Trojaning Attack on Neural Networks [C/OL]//Proceedings 2018 Network and Distributed System Security Symposium. San Diego, CA: Internet Society, 2018 [2025-02-26]. https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf.

[17] LIU Y, LEE W C, TAO G, et al. ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation [C/OL]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. London United Kingdom: ACM, 2019: 1265-1282 [2025-02-26]. https://dl.acm.org/doi/10.1145/3319535.3363216.

[18] ZENG Y, PARK W, MAO Z M, et al. Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective [C/OL]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada: IEEE, 2021: 16453-16461 [2025-02-26]. https://ieeexplore.ieee.org/document/9710118/.

[19] CHEN B, CARVALHO W, BARACALDO N, et al. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering [EB/OL]. arXiv, 2018 [2025-02-26]. http://arxiv.org/abs/1811.03728.

[20] LI Y, LYU X. Anti-Backdoor Learning: Training Clean Models on Poisoned Data [J].

[21] TRAN B, LI J, MA A. Spectral Signatures in Backdoor Attacks [J].

[22] GAO Y, XU C, WANG D, et al. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks [EB/OL]. arXiv, 2020[2025-02-26]. http://arxiv.org/abs/1902.06531.

[23] HUANG H, MA X, ERFANI S, et al. DISTILLING COGNITIVE BACKDOOR PATTERNS WITHIN AN IMAGE [J]. 2023.

Latent Cognitive Reinforcement for Anti-Backdoor Strategy

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing

Keywords

Latest publications