Many workgroups in industry, academia, and clinical practice have combined efforts to research and propose guidelines for the responsible use of AI in healthcare. Pacific AI has studied these guidelines and developed a unified set of AI policies that healthcare organizations can adopt, that provide coverage of the guidelines across all these proposals. The Pacific AI team tracks new and updated publications, standards, and drafts from these work groups on a regular basis, and releases and updates suite of policies on a quarterly basis. This ensures that you always have up-to-date actionable policies that comply with the guidelines devised by a broad set of experts across the healthcare and life science industries.
This is the current list of guidelines that the Pacific AI policies have been expanded to comply with.
CHAI Assurance Standards Guide. The Coalition for Health AI is a US-based organization that provides comprehensive guidance on quality and ethics for AI in healthcare. This guide was created by patient advocates, technology developers, clinicians, and data scientists, who worked together to create a common framework for AI standards in healthcare, based on real-world practices. This guide is designed for a wide audience, including stakeholders who are involved in the AI design, development, deployment, and usage process.
FUTURE-AI. The FUTURE-AI Consortium was founded in 2021 and comprises 117 interdisciplinary experts from 50 countries representing all continents, including AI scientists, clinical researchers, biomedical ethicists, and social scientists. Over a two year period, the FUTURE-AI guideline was established through consensus based on six guiding principles: fairness, universality, traceability, usability, robustness, and explainability. To operationalise trustworthy AI in healthcare, a set of 30 best practices were defined, addressing technical, clinical, socioethical, and legal dimensions. The recommendations cover the entire lifecycle of healthcare AI, from design, development, and validation to regulation, deployment, and monitoring.
QUEST. The Framework for Human Evaluation of LLMs in Healthcare Derived from Literature Review. A review of 142 shows gaps in the reliability, generalisability, and applicability of human evaluation of LLMs in different healthcare applications. The authors propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.
CLAIM. The Checklist for Artificial Intelligence in Medical Imaging is modelled to help authors and reviewers of AI manuscripts in medical imaging.
TEHAI. The Evaluation framework to guide implementation of AI systems into healthcare settings termed as TEHAI addresses three main evaluation components: capability, utility and adoption. These components can be applied at any stage of the development and deployment of the AI system.
MEDIC is introduced as a framework assessing LLMs across five critical dimensions of clinical competence, including medical reasoning, ethics and bias, data and language understanding, in-context learning, and clinical safety.
STARD-AI. Standards for Reporting of Diagnostic Accuracy Study (STARD) was developed to improve the completeness and transparency of reporting in studies investigating diagnostic test accuracy. The updated STARD address issues and challenges raised by AI-centred interventions.
CONSORT-AI. CONSORT-AI extends reporing guideline for clinical trials evaluating interventions with an AI component. The CONSORT-AI extension includes 14 new items that were considered sufficiently important for AI interventions that they should be routinely reported in addition to the core CONSORT 2010 items. CONSORT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human-AI interaction and provision of an analysis of error cases.
MI-CLAIM-GEN. The new MI-CLAIM-GEN checklist aims to address differences in training, evaluation, interpretability, and reproducibility of new generative models compared to non-generative (“predictive”) AI models. This MI-CLAIM-GEN checklist also seeks to clarify cohort selection reporting with unstructured clinical data and adds additional items on alignment with ethical standards for clinical AI research.
AMA Guidance on Augmented Intelligence. In November 2023, the American Medical Association’s Board of Trustees approved a set of principles developed by the Council on Legislation (COL) that serve as the framework of this report. The main topics addressed include AI oversight, disclosure requirements, liability, data privacy and security, and payor use of AI. In addition to the COL, these principles have been vetted among multiple AMA business units, several medical specialty societies that have an expertise in AI, and outside experts. The resulting policy builds upon and are supplemental to the AMA’s existing AI policy, especially Policy H-480.940, “Augmented Intelligence in Health Care,” Policy H-480.939, “Augmented Intelligence in Health Care,” and Policy D-480-956, “Use of Augmented Intelligence for Prior Authorization,” as well as the AMA’s Privacy Principles.
WHO guidance on the ethics and governance of LLMs. The World Health Organization report on Ethics and Governance of AI for Health identifies the ethical challenges and risks with the use of artificial intelligence of health, six consensus principles to ensure AI works to the public benefit of all countries. It also contains a set of recommendations that can ensure the governance of artificial intelligence for health maximizes the promise of the technology and holds all stakeholders – in the public and private sector – accountable and responsive to the healthcare workers who will rely on these technologies and the communities and individuals whose health will be affected by its use.
CRAFT-MD. The Conversational Reasoning Assessment Framework for Testing in Medicine (CRAFT-MD) is an approach for evaluating the readiness of Large Language Models for real-applications that transform doctor-patient interactions. It proposes a comprehensive set of recommendations for the evaluation of clinical LLMs based on empirical findings. These recommendations emphasize realistic doctor-patient conversations, comprehensive history-taking, open-ended questioning and using a combination of automated and expert evaluations.
TRIPOD-LLM. TRIPOD-LLM introduces a transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight and task-specific performance reporting.
The Pacific AI Policy Suite now includes controls and requirements that aim to fully cover the guidelines of all the above frameworks. This enables organizations to adopt one unified AI Governance framework that benefits from the collective effort of all teams who generate these guidelines.
