[edit]
Anchor-Guided Repair: A Defense Mechanism for Enhancing Stability of Compromised Pretrained Language Models Against Low-Precision and Weight Noise Attacks
Proceedings of IndabaX Nigeria 2026: Building Scalable AI That Works: From Research to Deployment in Resource-Constrained Environments, PMLR 319:368-381, 2026.
Abstract
We propose Anchor-Guided Repair, a defense mechanism for stabilising large language models (LLMs) compromised by weight noise injection and low-precision quantisation attacks. The method retrains the attacked model on clean text with an anchor regularisation loss that penalises large parameter deviations from a clean reference model. The combined objective balances language modelling loss and anchoring regularisation. Tested across various quantisation levels and weighted Gaussian noise attack scenarios, Anchor-Guided Repair consistently improves stability and performance relative to attacked models, demonstrating that anchoring can recover reliability even without proprietary training data.