Continuous AI Assurance Still Starts With a Point in Time

APRA Blog Post Header Image

Written by Shea Brown

Posted on 05/17/2026
In Blog | CEO's Corner

Last week, the Australian Prudential Regulation Authority (APRA) published a letter to industry summarizing what it learned from a targeted engagement with a group of large banks, insurers, and superannuation trustees on their AI adoption practices. The substance of the letter is not surprising to anyone who has spent time in this space, but one passage in particular caught my attention and is worth quoting directly:

APRA also observed reliance on point in time and sample based assurance methods, despite these methods being ill suited to probabilistic models that learn, adapt and degrade over time. Few entities had continuous validation or monitoring in place to detect issues such as model drift, bias, failure modes, or control breakdowns in a timely manner.

I agree with this. I have written before about the inadequacy of treating a single audit as a durable claim about a system that is changing underneath it, and I think anyone working seriously in AI assurance has to take the critique on board. The dynamic nature of the systems we are evaluating is real, and so is the risk that an audit conducted in March no longer says anything useful about the system that is actually in production in September.

What I want to push back on, gently, is the inference that often follows from there (at least in conversations that I’m having). The inference is something like, “and therefore continuous assurance replaces point-in-time assurance,” or, “point-in-time assurance is ill-suited for, and has no place in, AI assurance.” In my view, this is like throwing the proverbial baby out with the bathwater.

The most authoritative new document in this space, the recently published ETSI TS 104 008 V1.1.1 on Continuous Auditing-Based Conformity Assessment (CABCA) for AI-enabled systems, actually concedes this point. I want to walk through why, partly because the standard itself is worth understanding, and partly because the architecture it describes is a good way to think about what APRA is asking for. The frame I will keep coming back to is what I’ll call the tiered assurance stack, and the recurring problem I will keep seeing is the bootstrapping problem in continuous AI assurance.

What is CABCA, and what does the new ETSI standard actually do?

Continuous Auditing-Based Conformity Assessment is the methodology specified by ETSI Technical Specification TS 104 008 V1.1.1 (January 2026) for ongoing conformity evaluation of AI-enabled systems against applicable regulations and standards. The core idea is to replace one-off, manual, point-in-time audits with an automated cycle that continuously gathers evidence about an AI system, compares it against pre-defined thresholds, and updates the system’s conformity status in something close to real time. The framing in the standard is that traditional periodic audits offer a “snapshot of compliance at a particular moment,” which is inadequate for systems that evolve continuously through new data and model updates.

The methodology breaks into three layers. The first is Scoping, which produces a Conformity Specification, a document that consolidates all the requirements the system needs to meet, drawn from regulations, harmonized standards, sector rules, and internal policies. The second is Operationalization, which translates those high-level requirements into machine-readable metrics, defined measurements, and a configured automation pipeline. The output of this phase is the Operationalization Specification, the artifact that tells the continuous assessment engine what to measure and what thresholds to compare against. The third is the continuous loop itself, four steps running on a recurring trigger: continuous evidence gathering and measurement, automated analysis and findings mapping, continuous reporting and status updates, and iterative follow-up and monitoring.

The standard supports three modes for what happens with the evidence the loop produces. In self-assessment, the AI provider acts as its own auditing party. In third-party assessment, an independent external body reviews the assessment results and issues an attestation. In the certification path, an accredited certification body uses the continuous evidence stream to maintain what the standard calls a “living certificate,” with renewals and revocations driven by the continuous data.

If you stop reading at that summary, CABCA looks like exactly what APRA is asking for. Continuous, automated, lifecycle-spanning conformity assessment, replacing the brittle annual audit. Problem solved, all the way down.

What does CABCA require before the continuous loop can start?

Read further, and you find that CABCA is not replacing point-in-time work: it’s shifting where this work occurs under a continuous regime. The clearest statement of this is in clause 5.1 of the standard itself, which is worth quoting at length:

Foundational trust in the audit method is established through an initial setup process… The key output of this foundational phase is the Operationalization Specification artifact… provided to Stakeholders for a formal Initial Review and Approvement… This foundational approval process is conducted once at the start and is repeated only when significant changes are made to the AI-System, including its Data, or its Conformity Specifications… Ongoing trust in the audit results is maintained through a frequent assessment cycle.

ETSI TS 104 008 V1.1.1, clause 5.1

The standard separates two distinct kinds of trust the methodology has to produce. Foundational trust in the audit method is established when stakeholders formally review and approve the Operationalization Specification before the continuous loop begins, conducted once at the start and repeated only when significant changes are made. Ongoing trust in the audit results is what the continuous loop maintains between those foundational reviews.

That foundational approval is a point-in-time gate, by another name. It is the act of an informed reviewer, looking at the as-designed monitoring apparatus on a particular day, applying judgment to the question of whether the operationalization will actually generate evidence that supports the conformity claims the system wants to make. The continuous loop runs on top of that approval. It does not replace it. It cannot replace it, because the loop is itself a piece of software, configured against a specification, that produces conclusions about another piece of software. Without the foundational review, the loop is a confident report writer with no warrant.

Clause 5.2 hardens this. It defines a list of mandatory prerequisites that the organization has to meet before CABCA can be implemented at all. A comprehensive knowledge base of applicable conformity requirements. Demonstrated technical and operational expertise in translating those requirements into measurable, machine-readable metrics. A robust monitoring infrastructure. A risk management capability tailored to the specific system in question, with named owners. None of these prerequisites are produced by the continuous loop. They are required before the continuous loop is allowed to begin, and whether the organization actually has them, in the way the standard demands, is a question that has to be answered by someone qualified to evaluate it, on a particular day, against retrospective evidence.

Clause 6.3.3 closes the circle for the third-party path. The auditing party in third-party mode “shall verify or formally accept the scope and operationalization defined by the Auditee.” The standard is unambiguous here. The independent third party is not turning up to ratify the continuous loop’s outputs after the fact, it is taking responsibility for the foundational specification that those outputs depend on. That verification is not a continuous activity. It is a point-in-time engagement.

So when the standard critiques traditional approaches for offering a snapshot of compliance, it is making a narrower claim than people sometimes read it as making. The claim is that a snapshot of the AI system’s behavior is inadequate because the behavior changes. It is not the claim that the entire audit apparatus can be made continuous. The standard’s own structure puts the snapshot at the foundation, where the methodology is being established, and the continuous activity at the layer above it, where the methodology is being executed. The standard specifies where point-in-time assessments occur rather than pretending they can be removed.

The recursion that the field keeps trying to dodge

There is a pattern I keep seeing across different proposals that all have a similar shape. Confidential computing with cryptographic attestation of guardrails. LLM-as-judge architectures that evaluate other LLMs in production. Automated drift detectors that monitor the model. Continuous conformity engines that maintain a live status. Each of these is sometimes positioned as the replacement for the audit. Each of them is, in fact, an additional AI or software system whose own behavior now needs to be characterized and evaluated.

This is what I’m calling the bootstrapping problem in continuous AI assurance: an AI system’s reliability can’t be verified by checking it against itself or another unverified AI system. It must, at some point, be bootstrapped on an independent foundational evaluation.

I made this argument in a different context recently, in a piece on runtime guardrails. The shape of the problem is the same. If an LLM-based judge is the mechanism by which you claim a system is safe or compliant, then the judge is itself an AI system whose behavior needs to be characterized and evaluated. You cannot use an unaudited AI to certify another AI and treat the result as a closed assurance argument. The question simply moves one level up. How do we know the judge works? How do we know the drift detector has set its thresholds in places that are actually meaningful for the population the model is making decisions about? How do we know the operationalization specification, with its choice of metrics and threshold values, is a faithful translation of the conformity requirements it is supposed to operationalize? In each case, the answer cannot come from the layer that is doing the work. It has to come from a layer above, evaluated by someone independent of both, on the basis of evidence that exists at a particular point in time.

Continuous assurance does not eliminate this recursion: it relocates it. The point in time work that used to happen at the level of the AI system now happens at the level of the assurance apparatus. It happens less often, and it covers a smaller and more stable surface, but it cannot be deleted. There is no escape from bootstrapping the trust chain on the work of an independent professional, looking at retrospective evidence, applying judgment, on a particular day. That is where my earlier piece on ISAE 3000 and AI assurance lands, and it is where this one lands too.

What does a tiered AI assurance stack look like?

The right way to think about all of this, in my view, is as a tiered assurance stack rather than a choice between two methodologies.

At the top is the AI system itself. Its behavior is changing. Its inputs are drifting. Its operational context is evolving. A point-in-time audit of this layer alone is, as APRA correctly observes, an inadequate basis for sustained claims about how the system is behaving in production six months after the audit was issued.

In the middle is the continuous monitoring and assurance apparatus. Drift detectors, performance monitors, fairness measurements, guardrail evaluations, conformity-status engines, and the whole CABCA loop, if you are building one. This layer’s job is to keep the consequential claims about the system honest between foundational reviews. It generates the running evidence that lets stakeholders answer the question “is the system still doing what was originally claimed about it” without commissioning a fresh audit every time the question is asked.

At the bottom, holding the whole thing up, is a foundational point in time, professional engagement that evaluates the original claims and the apparatus. Did the operationalization specification translate the requirements faithfully? Are the metrics actually measuring what they purport to measure? Are the thresholds defensible? Are the prerequisites in place? Is the organization sufficiently independent, in the structural sense the standard requires, that the continuous outputs mean what they claim to mean? This is the assurance over the assurance. It is what every layer above it ultimately depends on for its credibility, and it has to be done by someone qualified, looking at retrospective evidence, on a particular day. Done well, it will be repeated on a defined cadence and after any material change to the system, the data, or the requirements, exactly as TS 104 008 contemplates.

Each layer of the tiered assurance stack is weak without the others. A point-in-time audit of just the system, without continuous monitoring underneath it, decays in value the moment the system changes. A continuous monitoring system without an independent foundational evaluation is a confident report generator with no warrant. The two are not in competition. They are different parts of the same stack, and the move the field needs to make is not to swap one for the other, but to take both seriously.

What should APRA-regulated entities actually do?

If you are an APRA-regulated entity reading the letter and thinking about how to close the assurance gap APRA has flagged, the practical implication is this: standing up continuous monitoring on its own is not an alternative to commissioning independent assurance over your AI systems. Continuous monitoring requires independent assurance before its outputs can be trusted as the basis for any claim you would want to put in front of a regulator, a board, or an enterprise customer. The same is true for any continuous compliance tooling you procure from a vendor, any LLM-as-judge architecture you deploy in production, and any drift detector you build internally. Each of them is an AI or software system making claims about another AI system, and each of them needs (ideally) an independent layer holding up its own claims before those claims can be relied on.

There is a related point in the APRA letter that I think gets overlooked, and that lands on the same conclusion from a different direction. APRA observes that internal audit and risk functions are challenged, that many lack the specialist skills and tools required to engage in AI assessment or audit, particularly where agentic behaviour, automated decision making, or AI-assisted code generation are involved. Accordingly, APRA observes that assurance activities often lag deployment. If an organization does not yet have the specialist capability to conduct a credible point-in-time evaluation of an AI system, it certainly does not have the specialist capability to design, validate, and operate the continuous monitoring apparatus that would replace one. The skills problem the regulator is naming is precisely the skills problem the foundational layer of the tiered assurance stack exists to solve.

The encouraging part is that the foundational layer is small, well-scoped work. It does not have to be repeated as often as the continuous loop runs. It is the kind of engagement that can be completed in weeks against a defined methodology by an independent party. The thing that has to be continuous is the measurement of the system. The thing that has to be periodic, professional, and independent is the evaluation of the apparatus that does the measuring. Once you have the architecture sorted, both pieces become tractable, and the gap APRA is asking entities to close becomes a problem you can actually plan against.

APRA’s critique of point-in-time methods, in other words, is correct as far as it goes. The mistake would be to read it as saying that point-in-time methods are no longer needed at all. The standards body that has done the most careful thinking about continuous AI assurance to date does not say that, and once you see why, the path forward gets clearer.

Frequently asked questions

 

Does CABCA replace traditional AI audit?

No. ETSI TS 104 008 explicitly separates “foundational trust in the audit method” from “ongoing trust in the audit results.” The foundational layer is established by a one-time review and approval of the Operationalization Specification, repeated only when the system, its data, or its requirements materially change. The continuous layer runs on top of that foundation. Both are required, and the standard’s third-party path is explicit that the independent auditor takes responsibility for the foundational layer.

What is the difference between point-in-time and continuous AI assurance?

Point-in-time AI assurance is the evaluation of a system or its assurance apparatus at a specific moment, against retrospective evidence, by an independent professional applying judgment. Continuous AI assurance is the ongoing automated measurement of the system against pre-defined metrics and thresholds. They operate at different layers of the same tiered assurance stack and are complementary rather than competing.

Who needs an independent foundational audit if continuous monitoring is in place?

Anyone making external claims about an AI system’s compliance, fairness, robustness, or safety on the basis of continuous monitoring outputs. The continuous monitoring apparatus is itself an AI or software system whose claims need to be characterized and evaluated by an independent party before its outputs can underwrite assurance claims. This is the bootstrapping problem in continuous AI assurance, and there is no way to avoid it.

How does APRA’s letter relate to the EU AI Act and ETSI TS 104 008?

APRA’s April 2026 letter to industry calls for a step change in AI risk management among Australian banks, insurers, and superannuation trustees, naming reliance on point-in-time and sample-based assurance methods as inadequate for dynamic AI systems. ETSI TS 104 008, published January 2026, specifies the continuous conformity assessment methodology that addresses this concern under the EU AI Act framework. Both push in the same direction: continuous evidence collection across the lifecycle, anchored in independent foundational evaluation.

What is the bootstrapping problem in continuous AI assurance?

The bootstrapping problem is the recursive issue that a system’s reliability can’t be verified by checking it against itself or another unverified system. In a continuous AI assurance context, this means that any continuous assurance, monitoring, or guardrail apparatus needs to be characterized and evaluated before it can make credible claims about the AI system it monitors. The recursion has to terminate somewhere, and the only honest place to terminate it is in a point in time engagement conducted by a qualified independent professional applying judgment to retrospective evidence. Continuous assurance does not eliminate this engagement: it relocates it from the AI system to the assurance apparatus.

Shea Brown is the Founder and CEO of BABL AI, an independent AI assurance firm specializing in independent audit and assurance over AI systems for bias, ethical risk, and governance compliance. BABL AI’s engagements are conducted under international assurance standards (ISAE 3000), with certified lead auditors. Shea has testified before state legislatures, presented to the U.S. Equal Employment Opportunity Commission, and engaged with the European Commission on the EU AI Act. He holds a PhD in Astrophysics and was a faculty member at the University of Iowa for 11 years.

Thanks for reading BABL AI! Subscribe for free to receive new posts and support our work.

Subscribe to our Newsletter

Keep up with the latest on BABL AI, AI Auditing and
AI Governance News by subscribing to our news letter