Model Drift, Bias, and Explainability: Why AI Risk Gets More Complicated in Practice

Written by Jeremy Werner

Jeremy is an experienced journalist, skilled communicator, and constant learner with a passion for storytelling and a track record of crafting compelling narratives. He has a diverse background in broadcast journalism, AI, public relations, data science, and social media management.
Posted on 03/23/2026
In Podcast

AI risk rarely shows up all at once. More often, it accumulates quietly—through shifting data, flawed assumptions, misunderstood outputs, and systems that seem to work until the moment they don’t. In Part 2 of this Lunchtime BABLing series on the many risks of AI, BABL AI CEO Dr. Shea Brown is once again joined by Jeffery Recker for a fast-moving but thoughtful conversation about some of the most important challenges organizations face when AI moves from experimentation to real-world use.

Where the first installment focused on concepts like data poisoning, prompt injection, and hallucinations, this episode turns toward a different but equally important set of issues: model drift, bias and discrimination, and the growing explainability gaps that emerge when organizations rely on increasingly complex AI systems. The conversation moves between technical detail and broader reflection, but the throughline is clear. As AI systems become more embedded in business processes, organizations need more than enthusiasm and surface-level familiarity. They need a deeper understanding of how these systems behave over time, where risk enters, and how to evaluate whether AI is actually doing what it is supposed to do.

Why Model Drift Matters More Than People Think

One of the first concepts the episode revisits is model drift, a problem that is often treated as a technical concern for developers but has consequences far beyond the engineering team. Shea explains that model drift is, at its core, a decline in model performance over time. In many cases, that decline happens because the data a model encounters in the real world begins to differ from the data it was trained on or optimized for.

That mismatch matters because AI systems do not operate in static conditions. Markets change, language changes, customer behavior changes, and organizations themselves change. A model that seemed highly effective at launch can become less reliable without anyone noticing—unless there is a deliberate process in place to monitor it. Shea emphasizes that organizations need some way of tracking effectiveness over time, whether through periodic testing, monitoring key metrics, or comparing incoming data distributions to historical training data.

The discussion becomes especially useful when it shifts from enterprise systems to everyday tools like ChatGPT. Jeffery raises the practical reality that many users have seen models get better at some things and worse at others even within a single conversation. That opens the door to a more accessible version of “drift”: context problems in large language model interactions. Shea’s advice is simple but revealing. If a model goes off track, don’t just keep replying and try to steer it back. Edit the original prompt. Otherwise, the incorrect material stays in the context window and continues to influence later outputs. It is a small tip, but one that points to a larger truth: using AI well often depends less on magic and more on understanding how the system carries information forward.

Bias Is Not the Same as Discrimination

The episode then moves into one of the most contested and misunderstood areas in AI governance: bias and discrimination. Jeffery frames the conversation in a way that reflects a tension many practitioners feel. The word “bias” has become politicized, especially in public discourse, yet in technical and statistical contexts it remains an essential concept for evaluating whether AI systems are functioning properly.

Shea draws an important distinction. Bias, in the broad sense, refers to systematic shifts or distortions in outputs relative to what they should be. Discrimination, by contrast, occurs when those biased outputs lead to adverse consequences for people—such as being denied a job, a loan, or a public benefit. The difference matters because not every bias is a legal discrimination issue, but many can still introduce serious risks.

That distinction also helps explain why bias testing cannot be limited to identity-based harms alone. Shea describes bias more broadly as a way of examining whether performance metrics shift across different dimensions of input data. Sometimes those dimensions are tied to protected characteristics like race or gender. But they can also include geography, time of day, educational background, writing style, or other factors that affect how a system behaves. In other words, bias is not just a political or legal issue. It is part of the statistical reality of testing whether an AI system is robust, effective, and safe.

How Bias Creeps In Even When Demographics Are Removed

One of the strongest sections of the episode addresses a common misconception that still surfaces in AI development: if a system does not explicitly use demographic fields, it cannot be biased. Shea notes that this misunderstanding remains common, especially in areas like hiring, where organizations may insist that their systems only compare resumes to job descriptions and therefore should not present a fairness problem.

But AI systems do not need a field labeled “race” or “gender” to produce biased outcomes. They can infer sensitive characteristics indirectly through language patterns, educational history, geography, employment gaps, and countless other proxies. Jeffery makes the point in practical terms: people from different places write differently, speak differently, and structure information differently. Those patterns can become signals for an algorithm whether the developer intended them to or not.

That is why the conversation repeatedly returns to diversity of thought and structured review. Having a wider range of perspectives involved in development and oversight can help uncover risks that a narrow team might miss. But Shea is equally clear that organizations cannot rely on informal awareness alone. They need repeatable methods for surfacing hidden risks.

The CIDA Framework and Structured Risk Assessment

This is where the episode becomes especially valuable for people trying to understand BABL AI’s approach. Shea describes the CIDA framework—context, input, decision, action—as a way of articulating the broader sociotechnical system around an AI tool. Rather than treating the model as an isolated piece of software, the framework forces organizations to define where the system operates, what data enters it, how it makes decisions, and what actions result from those decisions.

From there, the conversation moves into stakeholder analysis and risk assessment. Shea explains that it is not enough to identify a generic stakeholder like “the applicant” in a hiring system. Organizations need to think more deeply about who those applicants are, what different backgrounds or circumstances they bring, what they need from the process, and how the system could fail them in different ways. That richer understanding helps expose the points where bias can emerge and where harm can occur.

In practice, this is one of the most important lessons of the episode. Good AI governance is not just about legal checklists or abstract principles. It is about building structured ways of seeing risks that are otherwise easy to overlook.

Why Explainability Gets Harder With Modern AI

The final major theme of the episode is explainability, particularly the growing gap between what organizations want to know and what modern AI systems can actually reveal. In traditional machine learning contexts, explainability often involved identifying which input features most influenced a decision. In finance, for example, lenders might point to debt-to-income ratio or payment history as key reasons a person was denied credit.

But as Shea explains, that challenge becomes much harder when large language models and black-box systems are involved. If an organization feeds a complex mix of data into a model and asks it to generate a decision or score, the internal reasoning is no longer easy to parse. There may be thousands or millions of operations contributing to a result, and any explanation produced afterward may simply be a plausible reconstruction rather than a faithful account of what actually caused the output.

That distinction is critical. A model may be able to offer an explanation, but that does not mean the explanation is true in a causal sense. Shea points to mechanistic interpretability and “faithful chain of thought” as active areas of research, but he is clear that the field is still far from solving the problem. For now, trust in AI systems has to come not from perfect self-explanation, but from validation, testing, monitoring, and well-designed guardrails.

Choosing the Right AI for the Right Decision

The conversation closes on a practical note that cuts to the heart of responsible deployment. Not every problem should be handed to a generative AI model simply because the model is available. Jeffery raises the example of financial services and questions whether a loan decision—something already governed by strict law and longstanding expectations around explainability—is really the place to introduce ambiguous black-box reasoning.

Shea’s response captures the tension well. Simpler models may be more explainable, but they can also be crude approximations of reality. More complex systems may capture richer signals, but they introduce new trade-offs around bias, interpretability, and trust. The answer is not that one approach always wins. It is that organizations need to be intentional about what kind of AI they are using, why they are using it, and whether the benefits justify the risks in that context.

Why This Episode Matters

This episode works because it does not treat AI risk as a single problem with a single solution. Instead, it shows how multiple issues—performance drift, biased outputs, hidden proxies, weak explanations, and inappropriate deployment choices—interact in practice. Organizations adopting AI at scale are not just choosing tools. They are choosing how much uncertainty they are willing to tolerate and what systems of oversight they will build around that choice.

For professionals in AI governance, assurance, product development, risk management, or compliance, this conversation is a useful reminder that responsible AI is not only about intent. It is about structure, discipline, and the willingness to question whether a system is still working the way you think it is.

Where to Find Episodes

Lunchtime BABLing can be found on YouTubeSimplecast, and all major podcast streaming platforms.

Need Help?

Interested in building practical skills in AI governance and auditing? Visit BABL AI’s website for courses, certifications, and resources on AI risk management, algorithmic audits, and compliance.

Subscribe to our Newsletter

Keep up with the latest on BABL AI, AI Auditing and
AI Governance News by subscribing to our news letter