Anthropic has published fresh detail on how Claude Fable 5's cybersecurity safeguards actually work, alongside an early draft of a proposed framework for grading the severity of AI jailbreaks.

The company said its safety classifiers sort cybersecurity-related requests into four bands: prohibited use, such as malware development or ransomware; high-risk dual use, covering activities like penetration testing and privilege escalation that mirror legitimate security work but are blocked pending better controls on verifying who is asking; low-risk dual use, including open-source intelligence gathering, which is mostly allowed; and benign use, such as debugging or patch management, which the classifiers are not meant to catch at all.

A particular focus is vulnerability discovery. Anthropic said it aims to block Fable 5 from finding flaws that other widely available models cannot, while continuing to allow the kind of vulnerability-finding that is standard practice in defensive security.

On jailbreaks, the proposed Cyber Jailbreak Severity scale would score techniques across four factors: how much capability they hand an attacker, how broadly that capability applies, how easily it can be turned into a working attack, and how easy the technique itself is to find. These combine into five bands running from Informational to Critical.

Anthropic developed the framework with its Glasswing partners and is inviting feedback from academia, industry, civil society and government before treating it as a settled standard.


Garbage In, Garbage Faster: Why Agentic AI Exposes Your Organisational Debt
If Agentic AI follows your documented processes, what happens when those processes don’t reflect reality? Most organisations assume AI will figure things out. Business Architect Laura Van Weegen argues the opposite: AI doesn’t create new problems — it removes your ability to ignore the ones that have existed forever and a day. Undocumented workflows, undefined decision ownership, and human workarounds masking broken systems all get amplified at machine speed. You’ll learn: • Why “garbage in, garbage faster” is the real Agentic AI risk • The critical difference between feeding AI data versus information • How process debt compounds the same way technical debt does • Why exception handling is the new decision design priority • What one conversation reveals more than most AI readiness assessments • How to build explainability in from day one Key topics: Agentic AI readiness • Information architecture • Process debt • Data vs information • Contextual blindness • Decision ownership • Explainability vs traceability • Semantic infrastructure • Exception handling • Organisational accountability • Workflow documentation • AI governance Essential viewing for CISOs, CIOs, CFOs, and Chief Legal Officers evaluating Agentic AI deployment — before the human safety net disappears.

Share this post
The link has been copied!