Public health agencies and science communicators need AI to scale evidence-based messaging, but they cannot safely deploy existing tools without a dedicated verification infrastructure. The core challenge isn’t AI’s ability to generate content – it’s the lack of a standardized, reproducible way to verify that such content is accurate, safe, and contextually appropriate for public health communications.
This case study presents how Science to People and Pacific AI developed a 12-week certification process for VeriSciLM, establishing the evaluation protocols required to enable trustworthy science communication at scale:
- Regulatory-Grade Alignment: Adopting a “Governance-by-Design” approach compliant with ISO/IEC 42005, the NIST AI RMF, CHAI guidelines, and the NAM Healthcare AI Code of Conduct to ensure alignment with emerging standards.
- Expert-Led Benchmarking: Implementing specialized red-teaming and bias testing calibrated against a panel of 13 science communication experts to move beyond generic accuracy metrics toward true public health utility.
- Automated Guardrails: Operationalizing trust by embedding Guardian Agent evaluations into a CI/CD pipeline, enabling threshold-based enforcement that prevents “drift” or degradation during ongoing model development.
- Standardized Transparency: Generating CHAI-compliant Model Cards that document performance against healthcare-specific benchmarks (MedHELM, LangTest) and provide a clear clinical and scientific audit trail.
This evaluation framework now underpins both VeriSciLM (the core verification infrastructure) and Akari (the creator platform for digital science communication). Attendees will learn how to operationalize trust infrastructure – creating the quality assurance bridge between raw model capability and the rigorous demands of real-world deployment in public health, creator tools, and digital communication channels.









