{"id":188,"date":"2024-11-05T18:55:48","date_gmt":"2024-11-05T18:55:48","guid":{"rendered":"https:\/\/pacific.ai\/staging\/3667\/?p=188"},"modified":"2026-02-19T11:45:02","modified_gmt":"2026-02-19T11:45:02","slug":"automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models","status":"publish","type":"post","link":"https:\/\/pacific.ai\/staging\/3667\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/","title":{"rendered":"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models"},"content":{"rendered":"<div id=\"bsf_rt_marker\"><\/div><h2>What is Clinical Bias?<\/h2>\n<p>Clinical bias in <a href=\"https:\/\/www.johnsnowlabs.com\/introduction-to-large-language-models-llms-an-overview-of-bert-gpt-and-other-popular-models\/\" target=\"_blank\" rel=\"noopener\">LLM<\/a> (Language Learning Models) refers to the unfair or unequal representation or treatment based on medical or clinical information. This means that if a model processes medical data and generates differing outputs for two identical clinical cases, purely because of some extraneous detail or noise in the data, then the model is exhibiting clinical bias. Such biases can arise due to the uneven representation of certain medical conditions, treatments, or patient profiles in the training data.<\/p>\n<p><a title=\"AI governance platform\" href=\"https:\/\/pacific.ai\/staging\/3667\/ai-policies\/\">Addressing clinical bias<\/a> in LLMs requires careful curation and balancing of training data, rigorous model evaluation, and continuous feedback loops with medical professionals to ensure outputs are medically sound and unbiased.<\/p>\n<h2>Why LangTest?<\/h2>\n<figure id=\"attachment_86985\" aria-describedby=\"caption-attachment-86985\" style=\"width: 963px\" class=\"wp-caption aligncenter mb50 tac\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-86985 size-full\" src=\"https:\/\/www.johnsnowlabs.com\/wp-content\/uploads\/2023\/08\/0_AZI7Z94cini1wIrp.webp\" alt=\"Pipeline from testing models to release without demographic bias.\" width=\"963\" height=\"327\" \/><figcaption id=\"caption-attachment-86985\" class=\"wp-caption-text\">LangTest: Deliver Safe and Effective Language Models<\/figcaption><\/figure>\n<p>The surge in healthcare-specific machine learning models has presented a myriad of opportunities and challenges. Often, many of these models are incorporated into systems without a comprehensive assessment of their robustness, potential biases, or their aptness for real-world deployment. Such oversights only become evident post-deployment, leading to significant implications. This study introduces <strong>LangTest<\/strong>, an innovative open-source Python library crafted to empower developers in the assessment and enhancement of <a href=\"https:\/\/www.johnsnowlabs.com\/introduction-to-natural-language-processing\/\" target=\"_blank\" rel=\"noopener\">Natural Language Processing (NLP)<\/a> models. LangTest offers a systematic approach to validate models against biases and perturbations like typos, varying textual cases, and more. Furthermore, it seamlessly integrates into deployment and model monitoring pipelines, ensuring models are primed for deployment. A distinctive feature is its capability to augment datasets and annotations effortlessly, addressing and rectifying prevalent issues. By adopting LangTest, developers can pave the way for the deployment of more reliable and unbiased healthcare NLP models, thereby fortifying the overall quality of healthcare applications.<\/p>\n<p>You can find the LangTest library on <a href=\"https:\/\/github.com\/JohnSnowLabs\/langtest\" target=\"_blank\" rel=\"noopener\">GitHub<\/a>, where you can explore its features, documentation, and the latest updates. Additionally, for more information about LangTest and its capabilities, you can visit the official website at <a href=\"https:\/\/langtest.org\/\" target=\"_blank\" rel=\"noopener\">langtest.org<\/a>.<\/p>\n<h2>Demographic Bias in Clinical Models: An Exploration<\/h2>\n<p>Demographic bias has long been a topic of discussion and concern in many fields, including healthcare. Unfair representation or treatment based on factors like age, gender, race, and ethnicity can be detrimental and misleading in crucial domains, like <a title=\"Read more about the impact of medical LLMs on disease diagnosis and treatment\" href=\"https:\/\/www.johnsnowlabs.com\/the-impact-of-medical-llms-on-disease-diagnosis-and-treatment\/\" target=\"_blank\" rel=\"noopener\">medical diagnostics<\/a> and treatment plans.<\/p>\n<h2>What is Demographic Bias?<\/h2>\n<p>Demographic bias refers to unequal representation or treatment based on demographic factors. When considering clinical domains, if a machine learning model suggests varied treatments for two patients solely due to their demographic details, even when they have the same medical condition, it\u2019s exhibiting demographic bias.<\/p>\n<h2>But Isn\u2019t Demographics Important in Medical Decisions?<\/h2>\n<p>Certainly! Medical treatments are often tailored according to a patient\u2019s age, gender, and sometimes even race or ethnicity. For instance, treating a 70-year-old patient might differ considerably from a 10-year-old, even if they have the same ailment. Some diseases are even gender-specific, like breast cancer, or conditions related to male reproductive organs.<\/p>\n<p>However, there are instances where the demographic data shouldn\u2019t affect the suggested treatment. It\u2019s in these specific scenarios that we need to ensure our models aren\u2019t biased.<\/p>\n<h2>Our Experiment<\/h2>\n<p>To delve deeper, our team curated medical data files for three specialties: Internal Medicine, Gastroenterology, and Oromaxillofacial. Our primary criterion? The demographic details for a given diagnosis in these files should not impact the treatment plan.<\/p>\n<p>Each file comprises three columns:<\/p>\n<ul>\n<li><em>patient_info_A and patient_info_B<\/em>: Contains details related to two hypothetical patients (e.g., age, gender, patient ID, employment status, marital status, race, etc.)<\/li>\n<li><em>Diagnosis<\/em>: The medical condition diagnosed for both patients.<\/li>\n<\/ul>\n<figure class=\"mb50 tac\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-87816 size-full\" src=\"https:\/\/www.johnsnowlabs.com\/wp-content\/uploads\/2023\/09\/1_eySQgRBiBz03MVg3wafQhw.png\" alt=\"Data for testing demographic bias in treatment\" width=\"800\" height=\"227\" \/><\/figure>\n<p>Given our curation, the treatment plans for both patients should ideally be identical despite their varied demographic details.<\/p>\n<p>To test this, we fed the patient_info_A with the diagnosis to the Language Model (LLM) and requested a treatment plan. We then repeated this process using patient_info_B. This gave us two treatment plans \u2014 let\u2019s call them treatment_plan_A and treatment_plan_B.<\/p>\n<h2>Comparing the Treatment Plans<\/h2>\n<p>The challenge now was to gauge if these treatment plans were truly similar. Enter sentence transformers! By converting each treatment plan into embeddings using a sentence transformer model, we could then use cosine similarity to measure their resemblance.<\/p>\n<p>We set a similarity threshold, and based on whether the cosine similarity score surpassed this threshold, our test would either pass (suggesting the LLM didn\u2019t exhibit demographic bias) or fail (indicating potential bias).<\/p>\n<h2>Understanding Langtest Harness Setup<\/h2>\n<p>Imagine you have a toolbox named \u201cHarness\u201d. In this toolbox, you need to fit in three essential tools to make it work: the task it should do, the model it should use, and the data it should work on.<\/p>\n<h3>1. Task (What is the job?)<\/h3>\n<p>Here, you\u2019re telling the toolbox what kind of job it should perform. Since our focus is on medical stuff, we tell it to do a \u201c<strong>clinical-tests<\/strong>\u201d. But remember, this toolbox is versatile. It can do many other things too! It can recognize names (NER), sort texts (Text classification), detect harmful content (Toxicity), or even answer questions (Question-Answering).<\/p>\n<p>Supported Tasks: <a href=\"https:\/\/langtest.org\/docs\/pages\/docs\/task\" target=\"_blank\" rel=\"noopener\">https:\/\/langtest.org\/docs\/pages\/docs\/task<\/a><\/p>\n<h3>2. Model (Who should do the job?)<\/h3>\n<p>This is like choosing a worker from a team. You\u2019re picking which expert, or \u201cLLM model\u201d, should do the job. It\u2019s like selecting the best craftsman for a specific task.<\/p>\n<p>Supported Models: <a href=\"https:\/\/langtest.org\/docs\/pages\/docs\/model\" target=\"_blank\" rel=\"noopener\">https:\/\/langtest.org\/docs\/pages\/docs\/<\/a>model<\/p>\n<h3>3. Data (On what should the job be done?)<\/h3>\n<p>Lastly, you need to provide the material on which the expert should work. In our case, since we\u2019re dealing with medical tests, we have three special sets of information or \u201cfiles\u201d:<\/p>\n<ul>\n<li><em>Medical-files<\/em>: This is general information related to internal medicine.<\/li>\n<li><em>Gastroenterology-files<\/em>: Information specific to stomach-related issues.<\/li>\n<li><em>Oromaxillofacial-files<\/em>: Data about issues concerning the face and jaw.<\/li>\n<\/ul>\n<p>Oh, and there\u2019s a tiny detail: where are these experts from? It could be from \u201copenai\u201d, \u201cazure-openai\u201d, \u201ccohere\u201d, \u201cai21\u201d, or \u201chuggingface-inference-api\u201d.<\/p>\n<p>In Conclusion: Setting up the Harness is like preparing a toolbox for a job. You decide the task, choose your worker, and provide the material. And just like that, your toolbox is ready to work its magic!<\/p>\n<p>Supported Data: <a href=\"https:\/\/langtest.org\/docs\/pages\/docs\/data\" target=\"_blank\" rel=\"noopener\">https:\/\/langtest.org\/docs\/pages\/docs\/<\/a>data.<\/p>\n<h2>Testing in 3 lines of Code<\/h2>\n<div class=\"oh\">\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"\">!pip install &quot;langtest[langchain,openai,transformers]&quot;<\/pre>\n<\/div>\n<div class=\"oh\">\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"\">import os\nos.environ[&quot;OPENAI_API_KEY&quot;] =<\/pre>\n<\/div>\n<div class=\"oh\">\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"\">\n    from langtest import Harness\n    harness = Harness(task=&quot;clinical-tests&quot;, \n                    model={&quot;model&quot;: &quot;text-davinci-003&quot;, &quot;hub&quot;: &quot;openai&quot;},\n                    data = {&quot;data_source&quot;: &quot;Medical-files&quot;})\n    harness.generate().run().report()<\/pre>\n<\/div>\n<figure class=\"mb50 tac\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-87819 size-full\" src=\"https:\/\/www.johnsnowlabs.com\/wp-content\/uploads\/2023\/09\/1_R-8YjMnvuaaF9m5lZNJZHQ.png\" alt=\"Testing Demographic bias in large language models\" width=\"499\" height=\"57\" \/><\/figure>\n<p>The report provides a comprehensive overview of our test outcomes using the Medical-files data, which comprises 49 entries. Out of these, 4 tests were unsuccessful, while the remaining passed. The default minimum pass rate is set at 70%; however, it\u2019s adjustable based on the desired strictness of the evaluation.<\/p>\n<p>For a more granular view of the treatment plans and their respective similarity scores, you can check the <a title=\"Generative AI in Healthcare\" href=\"https:\/\/www.johnsnowlabs.com\/generative-ai-healthcare\/\">generated<\/a> results <em><b>harness.generated_results()<\/b><\/em><\/p>\n<p>Below is an example of a passed (similar) test-case in which the treatment plans suggested for both the patients are similar.<\/p>\n<figure id=\"attachment_87822\" aria-describedby=\"caption-attachment-87822\" style=\"width: 1200px\" class=\"wp-caption aligncenter mb50 tac\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-87822 size-full\" src=\"https:\/\/www.johnsnowlabs.com\/wp-content\/uploads\/2024\/10\/1_7Q5UQewb5AoCD6mWrVBKfA.webp\" alt=\"Checking racial bias in treatment plans generated by LLM - positive example\" width=\"1200\" height=\"675\" \/><figcaption id=\"caption-attachment-87822\" class=\"wp-caption-text\">Similar Treatment Plans<\/figcaption><\/figure>\n<p>Below is an example of a failed (dissimilar) test-case in which the treatment plans suggested for both the patients are not similar.<\/p>\n<figure id=\"attachment_87824\" aria-describedby=\"caption-attachment-87824\" style=\"width: 1200px\" class=\"wp-caption aligncenter mb50 tac\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-87824 size-full\" src=\"https:\/\/www.johnsnowlabs.com\/wp-content\/uploads\/2024\/10\/1_rPbO3u1DgFeXYYZEMLpI7g.webp\" alt=\"Checking bias in treatment plans generated by LLM - negative example\" width=\"1200\" height=\"675\" \/><figcaption id=\"caption-attachment-87824\" class=\"wp-caption-text\">Dissimilar Treatment Plans<\/figcaption><\/figure>\n<h2>Conclusion<\/h2>\n<p>Understanding and addressing demographic bias is pivotal in the medical field. While it\u2019s true that individual characteristics can and should influence some treatment plans, it\u2019s essential to ensure that biases don\u2019t creep into areas where these demographics should have no bearing. Our approach offers a robust way to <a title=\"Generative AI testing\" href=\"https:\/\/pacific.ai\/staging\/3667\/guardian\/\">test<\/a> and confirm that medical AI systems are offering unbiased suggestions, ensuring that all patients receive the best possible care.<\/p>\n<h2>FAQ<\/h2>\n<p><strong>How is demographic bias defined in clinical LLM-generated treatment plans?<\/strong><\/p>\n<p>Demographic bias occurs when an LLM suggests different treatment approaches for two patients with identical diagnoses solely due to differences in demographic details (e.g., age, race), rather than based on clinical factors alone.<\/p>\n<p><strong>How does LangTest detect demographic bias in treatment plan outputs?<\/strong><\/p>\n<p>LangTest runs paired tests: it provides two patient profiles with identical clinical conditions, compares the generated plans using sentence-transformer embeddings and cosine similarity, and flags bias if similarity falls below a set threshold.<\/p>\n<p><strong>How reliable is LangTest\u2019s demographic bias test?<\/strong><\/p>\n<p>The test sets a configurable minimum pass rate (e.g. 70%); using similarity metrics with well-defined thresholds ensures consistent and transparent detection of unjustified treatment discrepancies.<\/p>\n<p><strong>What real-world studies highlight demographic bias in clinical LLMs?<\/strong><\/p>\n<p>Research from Mount Sinai found LLMs favored invasive procedures or more intense care for sociodemographically labeled patients\u2014even when clinical need didn\u2019t warrant it. This underscores the need for fairness testing using tools like LangTest.<\/p>\n\n\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"FAQPage\",\n  \"mainEntity\": [\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How is demographic bias defined in clinical LLM-generated treatment plans?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Demographic bias occurs when an LLM suggests different treatment approaches for two patients with identical diagnoses solely due to differences in demographic details (e.g., age, race), rather than based on clinical factors alone.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How does LangTest detect demographic bias in treatment plan outputs?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"LangTest runs paired tests: it provides two patient profiles with identical clinical conditions, compares the generated plans using sentence-transformer embeddings and cosine similarity, and flags bias if similarity falls below a set threshold.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How reliable is LangTest\u2019s demographic bias test?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"The test sets a configurable minimum pass rate (e.g. 70%); using similarity metrics with well-defined thresholds ensures consistent and transparent detection of unjustified treatment discrepancies.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What real-world studies highlight demographic bias in clinical LLMs?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Research from Mount Sinai found LLMs favored invasive procedures or more intense care for sociodemographically labeled patients\u2014even when clinical need didn\u2019t warrant it. This underscores the need for fairness testing using tools like LangTest.\"\n      }\n    }\n  ]\n}\n<\/script>\n","protected":false},"excerpt":{"rendered":"<p>What is Clinical Bias? Clinical bias in LLM (Language Learning Models) refers to the unfair or unequal representation or treatment based on medical or clinical information. This means that if a model processes medical data and generates differing outputs for two identical clinical cases, purely because of some extraneous detail or noise in the data, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":336,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"nf_dc_page":"","content-type":"","inline_featured_image":false,"footnotes":""},"categories":[118],"tags":[],"class_list":["post-188","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-articles"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models - Pacific AI<\/title>\n<meta name=\"description\" content=\"If you are interested in the state-of-the-art AI solutions, get more in the article Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models\" \/>\n<meta name=\"robots\" content=\"noindex, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models - Pacific AI\" \/>\n<meta property=\"og:description\" content=\"If you are interested in the state-of-the-art AI solutions, get more in the article Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models\" \/>\n<meta property=\"og:url\" content=\"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/\" \/>\n<meta property=\"og:site_name\" content=\"Pacific AI\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/people\/Pacific-AI\/61566807347567\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-05T18:55:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-19T11:45:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/pacific.ai\/wp-content\/uploads\/2024\/12\/14-1.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"550\" \/>\n\t<meta property=\"og:image:height\" content=\"440\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"David Talby\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"David Talby\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/\"},\"author\":{\"name\":\"David Talby\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/person\\\/8a2b4d5d75c8752d83ae6bb1d44e0186\"},\"headline\":\"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models\",\"datePublished\":\"2024-11-05T18:55:48+00:00\",\"dateModified\":\"2026-02-19T11:45:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/\"},\"wordCount\":1422,\"publisher\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/14-1.webp\",\"articleSection\":[\"Articles\"],\"inLanguage\":\"en\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/\",\"url\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/\",\"name\":\"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models - Pacific AI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/14-1.webp\",\"datePublished\":\"2024-11-05T18:55:48+00:00\",\"dateModified\":\"2026-02-19T11:45:02+00:00\",\"description\":\"If you are interested in the state-of-the-art AI solutions, get more in the article Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/#breadcrumb\"},\"inLanguage\":\"en\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/#primaryimage\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/14-1.webp\",\"contentUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/14-1.webp\",\"width\":550,\"height\":440,\"caption\":\"Automated testing framework for detecting demographic bias in clinical treatment plans generated by large language models, highlighting secure medical documents, validation checks, and responsible AI evaluation in healthcare.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/pacific.ai\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#website\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/\",\"name\":\"Pacific AI\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#organization\",\"name\":\"Pacific AI\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/site_logo.svg\",\"contentUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/site_logo.svg\",\"width\":182,\"height\":41,\"caption\":\"Pacific AI\"},\"image\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/people\\\/Pacific-AI\\\/61566807347567\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/pacific-ai\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/person\\\/8a2b4d5d75c8752d83ae6bb1d44e0186\",\"name\":\"David Talby\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/David_portret-96x96.webp\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/David_portret-96x96.webp\",\"contentUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/David_portret-96x96.webp\",\"caption\":\"David Talby\"},\"description\":\"David Talby is a CTO at Pacific AI, helping healthcare &amp; life science companies put AI to good use. David is the creator of Spark NLP \u2013 the world\u2019s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams \u2013 in startups, for Microsoft\u2019s Bing in the US and Europe, and to scale Amazon\u2019s financial systems in Seattle and the UK. David holds a PhD in computer science and master\u2019s degrees in both computer science and business administration.\",\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/davidtalby\\\/\"],\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/author\\\/david\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models - Pacific AI","description":"If you are interested in the state-of-the-art AI solutions, get more in the article Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models","robots":{"index":"noindex","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"og_locale":"en_US","og_type":"article","og_title":"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models - Pacific AI","og_description":"If you are interested in the state-of-the-art AI solutions, get more in the article Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models","og_url":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/","og_site_name":"Pacific AI","article_publisher":"https:\/\/www.facebook.com\/people\/Pacific-AI\/61566807347567\/","article_published_time":"2024-11-05T18:55:48+00:00","article_modified_time":"2026-02-19T11:45:02+00:00","og_image":[{"width":550,"height":440,"url":"https:\/\/pacific.ai\/wp-content\/uploads\/2024\/12\/14-1.webp","type":"image\/webp"}],"author":"David Talby","twitter_card":"summary_large_image","twitter_misc":{"Written by":"David Talby","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/#article","isPartOf":{"@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/"},"author":{"name":"David Talby","@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/person\/8a2b4d5d75c8752d83ae6bb1d44e0186"},"headline":"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models","datePublished":"2024-11-05T18:55:48+00:00","dateModified":"2026-02-19T11:45:02+00:00","mainEntityOfPage":{"@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/"},"wordCount":1422,"publisher":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#organization"},"image":{"@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/14-1.webp","articleSection":["Articles"],"inLanguage":"en"},{"@type":"WebPage","@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/","url":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/","name":"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models - Pacific AI","isPartOf":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#website"},"primaryImageOfPage":{"@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/#primaryimage"},"image":{"@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/14-1.webp","datePublished":"2024-11-05T18:55:48+00:00","dateModified":"2026-02-19T11:45:02+00:00","description":"If you are interested in the state-of-the-art AI solutions, get more in the article Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models","breadcrumb":{"@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/#breadcrumb"},"inLanguage":"en","potentialAction":[{"@type":"ReadAction","target":["https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/"]}]},{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/#primaryimage","url":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/14-1.webp","contentUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/14-1.webp","width":550,"height":440,"caption":"Automated testing framework for detecting demographic bias in clinical treatment plans generated by large language models, highlighting secure medical documents, validation checks, and responsible AI evaluation in healthcare."},{"@type":"BreadcrumbList","@id":"https:\/\/pacific.ai\/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/pacific.ai\/"},{"@type":"ListItem","position":2,"name":"Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models"}]},{"@type":"WebSite","@id":"https:\/\/pacific.ai\/staging\/3667\/#website","url":"https:\/\/pacific.ai\/staging\/3667\/","name":"Pacific AI","description":"","publisher":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/pacific.ai\/staging\/3667\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en"},{"@type":"Organization","@id":"https:\/\/pacific.ai\/staging\/3667\/#organization","name":"Pacific AI","url":"https:\/\/pacific.ai\/staging\/3667\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/logo\/image\/","url":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/06\/site_logo.svg","contentUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/06\/site_logo.svg","width":182,"height":41,"caption":"Pacific AI"},"image":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/people\/Pacific-AI\/61566807347567\/","https:\/\/www.linkedin.com\/company\/pacific-ai\/"]},{"@type":"Person","@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/person\/8a2b4d5d75c8752d83ae6bb1d44e0186","name":"David Talby","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/03\/David_portret-96x96.webp","url":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/03\/David_portret-96x96.webp","contentUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/03\/David_portret-96x96.webp","caption":"David Talby"},"description":"David Talby is a CTO at Pacific AI, helping healthcare &amp; life science companies put AI to good use. David is the creator of Spark NLP \u2013 the world\u2019s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams \u2013 in startups, for Microsoft\u2019s Bing in the US and Europe, and to scale Amazon\u2019s financial systems in Seattle and the UK. David holds a PhD in computer science and master\u2019s degrees in both computer science and business administration.","sameAs":["https:\/\/www.linkedin.com\/in\/davidtalby\/"],"url":"https:\/\/pacific.ai\/staging\/3667\/author\/david\/"}]}},"_links":{"self":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts\/188","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/comments?post=188"}],"version-history":[{"count":9,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts\/188\/revisions"}],"predecessor-version":[{"id":2060,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts\/188\/revisions\/2060"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/media\/336"}],"wp:attachment":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/media?parent=188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/categories?post=188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/tags?post=188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}