Anthropic Study Shows Tiny Data Poisoning Can Backdoor Large Language Models
Anthropic released a report detailing how a small number of malicious documents can poison large language models (LLMs) during pretraining. The research demonstrated that as few as 250 malicious files were enough to embed backdoors in models ranging from 600 million to 13 billion parameters. The findings highlight a practical risk that data‑poisoning attacks may be easier to execute than previously thought. Anthropic collaborated with the UK AI Security Institute and the Alan Turing Institute on the study, urging further research into defenses against such threats.







