Regulated industries face a clear reality: AI adoption is necessary for remaining competitive, but recent cases in the legal sector highlight the importance of implementing proper safeguards. As firms across sectors integrate these tools, understanding what went wrong—and how to prevent similar issues—becomes essential.
In Alabama, law firm Butler Snow faced sanctions after citing four nonexistent cases generated by ChatGPT in a federal court filing. The firm holds substantial state contracts for defending Alabama's prison system, acknowledged that attorney Matthew Reeves used ChatGPT to find supporting case law, but failed to verify the citations through standard legal databases before filing.
Similar incidents have occurred across multiple jurisdictions. In Utah, attorney Richard Bednar was sanctioned for filing a brief that referenced a case called "Royer v Nelson"—which existed only in ChatGPT's output. In the UK, a £89 million damages case against Qatar National Bank included 45 case-law citations, of which 18 were entirely fictitious.
The pattern highlighted with just a few cases above seems to be the following, that AI hallucinations in court documents are appearing with increasing regularity across the globe. The situation resembles having a first-year law student working on your case—one who got into the right school and whose father happens to be the firm's biggest client, but who may not be the sharpest tool in the shed.
You'd naturally expect any legal professional to maintain human oversight and carefully check their work, but these global occurrences displays a troubling combination of laziness and arrogance. Despite the undoubted prestigious connections and educational pedigree that Lawyers hold, the cases seem to scream a lack of diligence, in an industry where a misplaced comma could cost billions, some soul searching should be done.
The fundamental issue is that many professionals haven't accepted the current limitations of AI, particularly large language models. These tools are not perfect systems or silver bullets for complex professional work. They're essentially sophisticated draft generators that require careful human review. Understanding AI as a first-draft tool rather than a finished-product solution is essential for safe implementation.
It's crucial to understand that LLMs represent a fundamental departure from "classic AI" systems. Traditional AI follows deterministic rules and algorithms—a calculator always gives the correct answer, a chess engine follows programmed strategies. LLMs, however, work somewhat more like human reasoning: they recognise patterns and generate responses based on probability, which means they can make mistakes, misremember facts, or confidently assert things that aren't true. Much like analysts or politicians. Just as you wouldn't accept that first year law student's draft without verification, LLM outputs require the same sceptical review.
Based on these line of thinking, several practical measures emerge. The most critical lesson is that AI tools must always operate with HUMAN IN THE LOOP. Every AI-generated output requires human review before use in any professional context. This approach ensures that AI remains a tool to assist professionals, not replace their judgement.
Mandatory verification protocols represent another essential safeguard. Every AI-generated output must be verified through authoritative sources specific to your industry. For lawyers, this means checking every citation in Westlaw, Pacer, or equivalent databases. Butler Snow has since instituted a firm policy requiring attorneys to seek approval when using AI for legal research.
Clear organisational policies have become urgent priorities. Following the UK incidents, Dame Victoria Sharp called on the Bar Council and Law Society to implement urgent measures. She emphasised that heads of barristers' chambers and managing partners must ensure all lawyers understand their professional and ethical duties when using AI.
The Utah case revealed that an "unlicensed law clerk" had written the problematic brief, highlighting the need for comprehensive training at all levels of an organisation. I'd love to see how that was initially billed. Training must emphasise the importance of human verification and the limitations of AI tools.
While the legal sector deals with these challenges retroactively, the UK's Financial Conduct Authority (FCA) is taking a somewhat different approach. The FCA announced a "supercharged sandbox" programme in collaboration with Nvidia, allowing banks and financial firms to experiment with AI under regulatory supervision.
Jessica Rusu, the FCA's chief data officer, explained: "This collaboration will help those that want to test AI ideas but who lack the capabilities to do so. We'll help firms harness AI to benefit our markets and consumers, while supporting economic growth."
The sandbox, set to begin operating in October, will focus on practical applications such as identifying authorised push payment fraud and detecting stock market manipulation. This controlled environment allows firms to explore AI's benefits while maintaining regulatory oversight and ensuring human professionals remain central to all decision-making processes.
The contrast between the legal sector's reactive response and the financial sector's proactive approach offers a roadmap for other regulated industries. As Jochen Papenbrock from Nvidia notes, "AI is fundamentally reshaping the financial sector by automating processes, enhancing data analysis, and improving decision-making."
The legal cases serve not as reasons to avoid AI, but as valuable lessons in implementation. These incidents underscore that AI, particularly large language models, aren't a panacea but merely a tool that must be checked. Accepting this reality—that current AI provides starting points rather than final answers—is crucial for responsible adoption. Courts have shown leniency when lawyers take responsibility for their mistakes—the serious sanctions typically come when there's a failure to acknowledge errors or implement corrective measures.
The experiences of the legal sector, combined with the FCA's sandbox approach, point to several essential practices. Always maintaining human in the loop stands as the most critical requirement—AI should augment human expertise, never replace human judgement. Verification protocols must be implemented before any AI-generated content is used in official capacity. Clear organisational policies on AI use need to be established and communicated at all levels. Creating controlled environments for testing AI applications before full deployment provides another layer of safety. Organisations should document AI use and ensure every AI output undergoes human review before use. If you don't log it it didn't happen.
As regulated industries continue to adopt AI tools, the goal isn't to avoid the technology but to use it responsibly with consistent human oversight. The legal profession's experience, while challenging, provides valuable guidance for other sectors navigating this transition. With proper safeguards, training, and human-centred oversight, AI can enhance professional services while maintaining the standards that regulated industries require.