The shift from traditional software to AI systems is forcing enterprises to rethink where security lives. In conventional development, security, and operations (DevSecOps) models, assurance is largely established before deployment through testing, validation, and code review. In AI systems, that assumption no longer holds.

As discussed by Google security advocates Priya Pandey, Developer Advocate Manager, Aron Eidelman, Senior DevRel Engineer, Security, and Leonid Yankulin, Senior DevRel Engineer, in a recent AI-360 webinar, the move to probabilistic, context-sensitive systems is pushing security decisively into runtime and exposing new risks in areas as foundational as observability.

Yankulin framed the core issue in technical terms: “the model behaviour is probabilistic, and as such, it is not deterministic.” This breaks a central premise of traditional testing. “Two runs of the same tests can produce completely different results for the same software,” he noted. The implication is operationally significant. Pre-deployment validation, long the backbone of enterprise security assurance, cannot reliably predict how an AI system will behave in production.

As a result, testing must evolve into a continuous discipline. Yankulin described the need to “do the testing, not only one time, but continuously,” incorporating what is increasingly referred to as evaluation testing alongside unit and integration testing. This is less a tooling change than a shift in control philosophy: assurance becomes probabilistic, measured over time, rather than binary and fixed at release.

Aaron Edelman expanded on the limitations this creates for security controls. Traditional mechanisms such as rule-based protections rely on predictable behavior and known attack patterns. That model struggles in AI systems, where even well-understood vulnerabilities can behave inconsistently. “Sometimes an attack will work successfully and consistently over 50 consecutive attempts, and other times it just completely fails,” he said, describing internal prompt injection testing. “We can’t lean on a completely non-deterministic system to have to meet certain requirements.”

This unpredictability drives the need for runtime enforcement layers. Rather than relying solely on pre-deployment safeguards, enterprises must monitor and control inputs and outputs as they occur. Yankulin described this emerging category as focused “on the semantic and content layer rather than on the network,” distinguishing it from traditional runtime security tools. These systems analyze prompts, responses, and tool interactions in real time, applying policy and filtering based on context rather than static rules.

The comparison to web application firewalls (WAFs) is instructive but limited. Edelman noted that WAFs historically created a false sense of coverage. In practice, WAFs are constrained by known rules and patterns. AI systems, by contrast, introduce what he described as a “fuzzy system to solve a fuzzy problem,” where unknown behaviors are the norm rather than the exception. This makes runtime controls necessary but inherently imperfect.

The operational challenge is compounded by the rise of AI agents. These systems introduce dynamic, multi-step interactions that further erode the boundary between build-time and runtime. Pandey highlighted that “agents make tool calls every time when you’re making that decision,” meaning security must account not just for code execution but for intent and context at each step. Static validation cannot anticipate these interactions; they must be governed as they happen.

At the same time, the expansion of runtime visibility introduces a less obvious but equally critical risk: observability itself. The standard approach in modern systems has been to maximize telemetry collection—logs, traces, and metrics—to support debugging, performance optimization, and security monitoring. In AI systems, that approach can expose sensitive data at scale.

Yankulin argued that the fundamentals of observability remain intact: “We have three main types of data that we collect. Metrics, traces and logs. AI doesn’t bring anything new into this domain,” at least not from a structural perspective. What changes is the content. Prompts and model interactions often contain unstructured, sensitive information, including personal data and proprietary business logic.

“The prompt is very often human language, so people can type and enter practically anything,” he said. This creates a new attack surface within observability pipelines. Sensitive data can be captured, stored, and accessed by systems or teams that were never intended to handle it. In some cases, the risk is not an external attack but internal exposure through standard logging practices.

DevSecOps for AI: Why 90% Stays the Same—and the 10% That Changes Everything
Is your DevSecOps pipeline ready for AI—or just ready for the AI you tested last week? AI systems behave probabilistically. The same prompt injection attack can succeed 50 times in a row, then fail completely the next minute. Traditional shift-left testing was built for determinism. AI isn’t. That gap is where risk lives. Three members of Google’s security advocacy team break down what actually changes—and what doesn’t—when AI enters your DevSecOps pipeline. You’ll learn: • Why 90% of AI security is still traditional security—and exactly where the novel 10% creates new exposure • Why DevSecOps transformations fail within a year—and the top-down cultural shift that prevents it • How the latest DORA research shows AI agents amplify existing practices, good or bad, at scale • What AI runtime security (e.g., Model Armor) does that a WAF cannot • Why AI logs capturing PII and system instructions in plain text demand a new approach to observability Key topics: Non-determinism in AI testing • Continuous evaluation vs. pre-deployment scans • Model Armor & runtime security layers • Sensitive data redaction in logs • Prompt injection defense-in-depth • Agentic workload security • WAF limitations with AI agents • DevSecOps governance & top-down culture For CISOs, DevSecOps leads, and security architects navigating AI adoption: the pipeline you spent three years building is mostly still valid. This session tells you exactly what to add. All viewers will receive a c’heat sheet’ compiling links galore courtesy of Aron Eidelman.

Edelman reinforced this point by drawing parallels to existing systems such as customer support chats in regulated industries. Logs are necessary for troubleshooting and auditability, but full visibility creates compliance and security challenges. Traditional encryption approaches are insufficient because they operate at a coarse level. “You either have the key and you can read everything, or you can’t see any of it,” he explained.

The emerging requirement is more granular control. “The ability to redact or mask specific characters in a string, or even encrypt a specific string, is an added capability,” Edelman said. This allows systems to retain operational visibility while limiting exposure of sensitive data. The familiar pattern of displaying only partial identifiers—such as the last four digits of a credit card—becomes a model for AI observability.

The complexity increases when AI systems interact with external tools and services. Without controls, sensitive data can propagate beyond the original system boundary. Edelman noted the need “to prevent certain types of sensitive data from getting to the model” and to stop it from being forwarded to downstream systems. This extends security concerns from storage and access to data flow and transformation.

Pandey emphasized the importance of linking observability back to security operations. In agent-driven environments, “agents act on behalf of users,” raising questions about attribution and accountability. Enterprises must ensure that actions can be traced “back to both the agent and the user,” enabling meaningful alerting and investigation. This is not a new requirement in principle, but the mechanisms must adapt to autonomous and semi-autonomous systems.

The combined effect of these changes is a shift in how enterprises approach AI deployment at scale. Security is no longer a discrete phase in the development lifecycle but an ongoing function embedded in runtime operations. Observability, once a purely enabling capability, becomes a potential source of risk that must be actively managed.

Strategically, this has implications for tooling, architecture, and governance. Enterprises must invest in runtime controls that operate at the level of data and semantics, not just infrastructure. They must redesign observability pipelines to handle unstructured and sensitive inputs safely. And they must align security, development, and operations teams around continuous evaluation rather than point-in-time assurance.

As AI systems become more central to business processes, these considerations move from edge cases to core requirements. The transition is not about replacing existing security practices but extending them into domains where behavior cannot be fully predicted and data cannot be assumed to be safe once logged. In that context, runtime becomes the primary control plane—and observability, if not carefully managed, becomes part of the attack surface.

DevSecOps for AI: Why 90% Stays the Same—and the 10% That Changes Everything
Is your DevSecOps pipeline ready for AI—or just ready for the AI you tested last week? AI systems behave probabilistically. The same prompt injection attack can succeed 50 times in a row, then fail completely the next minute. Traditional shift-left testing was built for determinism. AI isn’t. That gap is where risk lives. Three members of Google’s security advocacy team break down what actually changes—and what doesn’t—when AI enters your DevSecOps pipeline. You’ll learn: • Why 90% of AI security is still traditional security—and exactly where the novel 10% creates new exposure • Why DevSecOps transformations fail within a year—and the top-down cultural shift that prevents it • How the latest DORA research shows AI agents amplify existing practices, good or bad, at scale • What AI runtime security (e.g., Model Armor) does that a WAF cannot • Why AI logs capturing PII and system instructions in plain text demand a new approach to observability Key topics: Non-determinism in AI testing • Continuous evaluation vs. pre-deployment scans • Model Armor & runtime security layers • Sensitive data redaction in logs • Prompt injection defense-in-depth • Agentic workload security • WAF limitations with AI agents • DevSecOps governance & top-down culture For CISOs, DevSecOps leads, and security architects navigating AI adoption: the pipeline you spent three years building is mostly still valid. This session tells you exactly what to add. All viewers will receive a c’heat sheet’ compiling links galore courtesy of Aron Eidelman.

Share this post
The link has been copied!