AI in ED QI

Jeremy: Welcome back to the TIME Podcast. I’m Jeremy — and today, we’re walking straight into one of the most important transitions in emergency care: the shift from retrospective quality improvement to continuous, AI-enabled detection of care gaps

Hamish: And I’m still Hamish. EDs in Australia, New Zealand, and across the world are now producing unprecedented volumes of digital information. Vitals, labs, imaging, notes, timestamps, logistics data. But the real question is: What do we actually do with all of it?

Jeremy: Exactly. Humans can’t operationalise this scale of information. But AI can. And today, we’re going to explore where AI is genuinely elevating quality and safety and where it has very publicly stumbled.

Hamish: Let’s begin with the reality: ED quality improvement is still largely built on incident reports, case reviews, and RCAs.For anyone unfamiliar, an RCA. A root cause analysis is a structured, retrospective dissection of a significant adverse event: identifying system failures, tracing contributing factors, and recommending corrective actions

Jeremy: The problem is structural. RCAs occur months after the event. Incident reporting captures only a fraction of safety issues — clinicians are too busy, and cognitive biases filter what gets reported. And importantly, we’ve become very good at measuring processes but far less good at measuring clinical quality

Hamish: Precisely. We track door-to-doc and ED LOS, but not real-time diagnostic reliability, adequacy of follow-up, or early signs of cognitive overload. Traditional QI illuminates only a small part of the safety landscape

Jeremy: This sets the stage for AI. Unlike manual QI, AI can analyse thousands of variables simultaneously and continuously, silently, and without fatigue.

Jeremy: One of the most apparent early successes is retrospective machine learning analysis. Look at the RAPIDx trial in South Australia — algorithms comparing each chest-pain patient to massive cohorts of “lookalike” past patients to identify which pathways led to better outcomes

Hamish: This is the modern audit cycle. Instead of a registrar reviewing 200 charts over a month, AI reviews hundreds of thousands — detecting systematic delays, deviations from protocol, or inequities that human reviewers will never see

Jeremy: And importantly, retrospective insights aren’t simply descriptive. They inform guideline improvement, workflow redesign, and even real-time decision support tools for future patients

Hamish: Retrospective AI is not the destination — it’s the foundation. It gives us a learning loop we’ve lacked for decades

Hamish: Real-time AI is where the stakes rise. Systems continuously synthesise vitals, lab trends, triage notes, and clinician behaviour to detect early deterioration, sepsis, or missed risks

Jeremy: But let’s clarify a common metric: AUC — Area Under the Curve.AUC of 1.0 equals perfect discrimination. AUC of 0.5 equals random guessing. So when the Epic Sepsis Model, deployed widely across U.S. hospitals, showed an AUC of 0.63 in independent testing, that’s essentially a low-grade signal at high stakes

Hamish: And the operational impact was worse: the model triggered alerts on 1 in 5 patients, with eight false positives for every genuine case. This is the definition of alert fatigue — the psychological phenomenon in which the volume of alerts degrades clinicians' responsiveness

Jeremy: Once clinicians start dismissing alerts, the system collapses. And importantly, alert fatigue is not benign — it is itself a patient safety risk

Hamish: But we shouldn’t let one high-profile failure overshadow the category. Real-time tools, especially those using NLP to interpret triage narratives, can detect risk signals that structured scoring systems consistently miss

Jeremy: Real-time AI requires precision. When calibrated well, it becomes an additional sensory layer in the ED. When calibrated poorly, it becomes noise..

Jeremy: Now, let’s turn to the evidence. Not the hypothetical, but the real-world deployments that show us what AI can truly achieve when implemented well. New Zealand. The AI Scribe Programme

Hamish: New Zealand’s nationwide rollout of AI scribes is remarkable not because it’s flashy, but because it’s quietly transformational. The technology listens to the clinician–patient interaction, drafts the note, and hands back enormous cognitive bandwidth. Clinicians reported they could see an additional patient per shift, and more importantly, their documentation became more reliable

Jeremy: And that reliability matters. Documentation gaps are one of the most common sources of downstream care failures — missed follow-up, inconsistent plans, and safety-netting omissions. The AI scribe addresses these without changing clinical judgement, simply by ensuring the record reflects the reality of the consultation

Hamish: And it’s worth emphasising the cultural piece here: New Zealand didn’t simply adopt an off-the-shelf solution. They localised language, respected Māori terminology, and addressed privacy concerns transparently. That’s why clinicians embraced it

Jeremy: Next is South Australia’s statewide chest X-ray AI rollout — arguably the most clinically elegant use of AI so far. Radiologists described it as having a “second pair of eyes” available twenty-four-seven. One that never tires, never rushes, and never loses vigilance during a high-volume night shift

Hamish: And the gains were not abstract. Subtle pneumothoraces, faint rib fractures, and small infective changes, the kinds of findings most vulnerable to human fatigue, were detected earlier and more consistently

Jeremy: For ED teams, this is not about delegation. It’s about assurance. When your radiology report aligns with the AI’s highlighted regions, it reinforces confidence. When it doesn’t, it prompts a second look. That’s the kind of partnership that elevates diagnostic reliability without eroding professional autonomy

Hamish: Canada’s SurgeCon platform is another compelling example because it doesn’t target a clinical condition — it targets flow, the perennial challenge of every emergency department. It integrates predictive analytics with predefined operational responses triggered when crowding thresholds are crossed

Jeremy: And the results were profound: An ED went from 104 minutes average waiting time to 42 minutes, despite a 25% increase in volume

Hamish: Critically, the lesson here is not the prediction engine itself. It’s the structured choreography that follows the prediction the predefined actions, the escalation pathways, the coordinated team response

Jeremy: Exactly. Prediction without action is trivia. Prediction tied to organisational behaviour is QI.

Jeremy: Finally, we need to revisit the Epic Sepsis Model — a textbook example of how AI can fail when transparency, validation, and calibration are absent.

Hamish: With an independent AUC of 0.63, and alerting 20% of all inpatients with overwhelmingly false positives, it created more noise than insight

Jeremy: And beyond the statistics, the real issue was cultural. Clinicians lost trust. Alerts became background noise. And because the model was proprietary and opaque, hospitals couldn’t interrogate or adjust its internal workings

Hamish: It demonstrates that AI can create new gaps — not just close old ones — if implemented without governance.

Jeremy: Before we look forward, we need to understand the adversaries: the forces that consistently undermine AI in clinical environments. Bias and Equity

Hamish: Bias is not a footnote. It’s a predictable consequence of training models on datasets that under-represent key populations. Indigenous communities, rural groups, culturally diverse patients — if they're missing from the dataset, the model cannot perform equitably for them

Jeremy: And here’s the nuance. Bias isn’t solved by “adding more data.”It requires re-examining the labels used in training, which often encode historical inequities in themselves.

Hamish: Then there's alert fatigue. Alert fatigue isn’t simply an annoyance. It’s a safety hazard. An AI that over-alerts erodes vigilance and heightens cognitive load. Once clinicians start disregarding alerts, even the correct alerts lose effectiveness.

Jeremy: We also have to mention miscalibration and drift. Miscalibration occurs when a model’s predictions lose alignment with clinical reality. A model trained in one population or even one season may perform poorly in another

Hamish: And drift is inevitable. Practices change, patient demographics shift, disease patterns evolve. If you don’t monitor and recalibrate, your AI degrades silently. And what about workflow, integration and trust?

Jeremy: No AI system succeeds in a workflow it wasn’t designed for. Interruptive alerts at the wrong moment, unclear responsibility pathways, or outputs without explanation all erode clinician acceptance.

Hamish: Trust is earned, not assumed. AI must justify its presence by improving reliability without increasing friction.

Hamish: OK. Let's talk about the implementation lessons. What did the successful systems get right?

Jeremy: So, stepping back from the case studies, what separates the AI deployments that worked from those that quietly faded away? When you look across New Zealand’s scribes, South Australia’s imaging AI, and Canada’s SurgeCon, three principles come up again and again: strong governance, genuine clinician engagement, and alignment with real QI priorities.

Hamish: Let’s start with governance — because this is where much of the enthusiasm collapses. Good governance means: validating the model locally, defining acceptable performance thresholds, insisting on transparency, and, crucially, having the authority to switch the model off if it underperforms. You wouldn't keep using a defective defibrillator. AI deserves the same standard.

Jeremy: And governance isn’t just technical oversight — it’s cultural oversight. It reassures clinicians that there’s a safety net around the tool. Without that reassurance, no amount of accuracy will drive adoption.

Hamish: Then there’s clinician engagement. Every successful implementation had clinicians involved early, not as end-users but as co-designers. That means the AI grew around the workflow, not the other way around.

Jeremy: And the psychology here matters. If clinicians feel the AI is observing or judging them, adoption collapses. If it’s framed as reducing burden and improving reliability, adoption accelerates.

Hamish: Exactly. The AI scribe in New Zealand succeeded not because it was technologically superior, but because clinicians felt the value: less typing, more thinking, and more face-to-face care.

Jeremy: And the final principle is alignment with actual QI needs. The standout deployments all targeted problems clinicians already recognised: documentation burden, imaging accuracy, crowding, and diagnostic variability.

Hamish: Not hypothetical problems, real ones. AI doesn’t succeed because it’s novel; it succeeds when it integrates into the department’s existing improvement agenda.

Jeremy: So that’s the present state: strong governance, deliberate engagement, precise alignment.

Jeremy: Which raises the next question. If those principles define success today, what will define success tomorrow? Where is this heading as AI becomes embedded across systems rather than in isolated tools?

Hamish: Exactly, which brings us into the future landscape.

Hamish: The next phase won’t be about standalone tools at all. It’s about integrated AI ecosystems. Platforms that operate across clinical, operational, and safety domains simultaneously. Imagine a system that: forecasts ED surges, monitors patients for early deterioration, detects missed follow-up before it becomes harmful, analyses yesterday’s cases for diagnostic variance, and coordinates staffing or resource escalation —all through one interface, not five separate ones.

Jeremy: And importantly, these systems won’t be static. They’ll recalibrate continuously as disease patterns shift, as patient demographics evolve, and as practice changes. That’s the essence of a learning health system — AI that doesn’t merely monitor the environment, but adapts to it.

Hamish: Another defining feature of future AI is the centrality of equity. We’re moving toward a world where an AI model must demonstrate not just accuracy, but consistency across demographic groups: Indigenous patients, rural populations, culturally diverse communities.

Jeremy: And that’s not optional anymore. It’s becoming a regulatory expectation: bias audits, subgroup analysis, transparent reporting. These will become as routine as calibration checks.

Hamish: And perhaps the most significant conceptual shift: AI will stop being viewed as “technology” and start being viewed as clinical infrastructure, part of the reliability architecture of emergency care.

Jeremy: Just like the monitors, the blood gas machine, or the trauma paging system, AI will seamlessly function in the background. Not centre stage, not demanding attention, but constantly supporting safer, more consistent care.

Hamish: So the themes crystallise into two distinct but connected insights. First, AI succeeds when it sits inside a disciplined framework: strong governance, genuine clinician involvement, and a focus on solving real, pre-existing gaps in care. Without those foundations, even the most sophisticated model will fail to translate into practice

Jeremy: And second, the trajectory is clear. AI is moving toward full ecosystem integration — platforms that learn continuously, adapt to shifting patterns, and uphold equity as a core requirement rather than an afterthought. The future ED will rely on AI not as an accessory, but as part of its operational and clinical infrastructure.

Hamish: That brings us to the end of this TIME episode. AI’s role in emergency care is accelerating, but the core principle remains unchanged: technology must amplify clinical excellence, not compete with it.

Jeremy: For those wanting to dive deeper, we’ve prepared supplementary material that accompanies this episode, extended reading, practical implementation checklists, and annotated case analyses. You’ll find it in the members’ section of Clintix.com. If you can’t access it, speak with any of the friendly conference organisers and they’ll point you in the right direction.

Hamish: And a special thank-you to our sponsor, Clintix Pro. An AI-powered study companion for Australian critical care trainees, helping them master exam preparation through adaptive, high-fidelity learning systems.

Jeremy: Built by clinicians, for clinicians — aligned with the standards upheld by the Australasian College of Emergency Medicine and the Australian and New Zealand College of Anaesthetists, this study tool is a game-changer for intelligent exam preparation

Hamish: Thanks for joining us. We’ll see you at the next session. Bye!

AI and Evidence in Emergency and Critical Care: AI in ED QI — full transcript