Anthropic AI Shows Early Warning Signs in Cybersecurity, Biology Capabilities

Anthropic's Claude AI models are displaying "early warning" signs of rapid progress in cybersecurity and biology capabilities, approaching undergraduate-level cybersecurity skills and expert-level biological knowledge in some areas, according to the company's Frontier Red Team assessment released on March 19th, 2025. However, present-day models fall short of thresholds considered to generate substantially elevated national security risks.

The assessment, based on work across four model releases over the past year, shows Claude improved from high school to undergraduate level in Capture The Flag cybersecurity exercises within one year. Claude 3.7 Sonnet solves about one-third of Cybench CTF challenges within five attempts, up from five percent with the previous frontier model. The improvements span multiple cybersecurity categories including discovering vulnerabilities in insecure software, web applications, and cryptographic protocols.

In realistic cyber range experiments with Carnegie Mellon University involving approximately 50 hosts, models cannot autonomously succeed in network environments. However, when equipped with cybersecurity research tools, Claude successfully replicated attacks similar to known large-scale personally identifiable information theft from credit reporting agencies.

Biology capabilities show swift progress, with Claude advancing from underperforming world-class virology experts to comfortably exceeding baseline performance on laboratory troubleshooting scenarios within one year. Internal testing reveals models approaching human expert baselines in biology protocols and DNA/protein sequence manipulation, with the latest model exceeding human baselines in cloning workflows. Models remain inferior to human experts at interpreting scientific figures.

Controlled weaponisation studies found Claude provided "some amount of uplift" to novices compared to participants without model access, though the highest-scoring plans contained critical real-world failure points. Expert red-teaming produced mixed conclusions, with some noting knowledge improvements while others identified too many critical planning failures for successful end-to-end attack execution.

Anthropic's pre-deployment testing partnerships with US AI Safety Institute and UK AI Security Institute contributed to Claude 3.7 Sonnet capability assessments, informing AI Safety Level determinations. The company established a first-of-its-kind partnership with the National Nuclear Security Administration for classified environment evaluation of nuclear and radiological risk knowledge, demonstrating public-private collaboration potential in sensitive domains.

The rapid capability advancement positions AI models closer to crossing thresholds requiring AI Safety Level 3 safeguards, prompting increased security investment. Anthropic's Responsible Scaling Policy and evaluation partnerships enable faster development with responsible advancement confidence. The company emphasises scaling automated evaluation frequency and deeper government collaboration for improved risk assessment across focal areas.

Sign up for AI-360