Sleeper Agents in the Weights: Microsoft’s New Scanner Unmasks Hidden Backdoors in Open-Weight LLMs
Microsoft has disseminated a nascent technical treatise regarding the detection of backdoors within open-weight Large Language Models (LLMs)—specifically those designed for local instantiation. This research addresses a clandestine vulnerability wherein a model’s behavior remains...