Webinar

Testing Healthcare AI In 2026: A Deep-Dive On 60+ Peer-Reviewed Evaluations For Clinical Tasks, Bias, Safety, And Regulation

Watch live
June 10, 2026 @ 2:00 PM ET

Healthcare AI accountability has shifted to the deploying organization. The FDA’s 2024 guidance on lifecycle management of AI-enabled devices, HHS HTI-1’s transparency requirements, ACA Section 1557’s algorithmic-discrimination provisions, and the 2025–2026 wave of state AI-impersonation laws all land on the deploying organization — the hospital, payer, or digital-health company that puts the AI in front of a patient — rather than the model vendor. And the failure modes that produce enforcement risk are the ones most testing programs miss. A clinical task benchmark misses demographic bias. A bias audit misses cognitive bias and sycophancy. A red-team exercise misses regulatory readiness. A privacy scan misses adolescent-confidentiality and child-safety failures. None of it runs continuously in production, where models drift, RAG corpora change, and new regulations take effect every quarter.

We’ll walk through 60+ healthcare-specific test suites spanning seven categories: clinical decision support, general medical knowledge, documentation and patient communication, research and administration, safety and robustness, cognitive and demographic bias, and social bias. The suites draw on peer-reviewed open-source sources including MedHELM, HealthBench, LangTest, and ChildSafeLLM — and on proprietary suites Pacific AI built where no public equivalent existed, including clinical cognitive bias, several demographic and social bias datasets, and targeted clinical safety probes. Every suite is clinician-reviewed, traceable to its source publication or methodology, and mapped to the 250+ regulations, frameworks, and standards in the Pacific AI Policy Suite, refreshed quarterly as new legislation takes effect.

What you’ll learn:

  • Which production failure modes — clinical, fairness, safety, regulatory — generic LLM benchmarks structurally miss, and why they surface only under healthcare-specific testing.
  • How the test library is organized across the seven categories and the specific suites in each.
  • How the same suites run as a pre-release CI/CD gate (Gatekeeper) and continuously against deployed systems (Guardian), throttled to near-zero production impact.
  • What “good” looks like for a healthcare AI testing program in 2026: continuous testing at the same cadence as production deployment, owned by the system team rather than a central committee, with quarterly re-testing against the Policy Suite

We’ll also present a live demo highlighting comprehensive testing and monitoring across pre-release and production environments. You’ll see a clinical case run through demographic perturbations with fairness scores recomputed live on each run, then watch the same test suite execute as a pre-release CI/CD gate and as a throttled probe against a live production endpoint. We’ll close the demo by publishing test results directly into a CHAI-compliant model card, with no manual authoring.

Webinar Sign Up


About the speakers
Alin Blidisel
Engineering Technical Lead at Pacific AI

Experienced Big Data Developer with a demonstrated history of working in the information technology and services industry.
One of the strongest attributes of my professional profile is my extensive expertise in Big Data. This experience enables me to make sense of vast amounts of unstructured and structured data and extract valuable insights to drive strategic decision-making. My knowledge spans across various Big Data technologies, several highly scalable and robust data processing systems to handle petabytes of data crucial for machine learning and data-driven applications.
Experienced with AWS, Google Cloud, and Microsoft Azure, trying to use the best practices in cloud infrastructure management but also to align them with AI-driven strategies, facilitating seamless scalability and improved performance.
Strong engineering professional with a Master’s degree focused in Artificial Intelligence from West University of Timisoara.

Alex Thomas
Alexander Thomas
Principal Data Scientist, Pacific AI

Alex Thomas is a Principal Data Scientist at Pacific AI. He’s used natural language processing, machine learning, and knowledge graphs on clinical data, identity data, job data, biochemical data, and contract data. Now, he’s working on measuring Large Language Models and their applications.