{"id":198,"date":"2024-11-05T19:09:54","date_gmt":"2024-11-05T19:09:54","guid":{"rendered":"https:\/\/pacific.ai\/staging\/3667\/?p=198"},"modified":"2026-02-19T11:31:15","modified_gmt":"2026-02-19T11:31:15","slug":"building-responsible-language-models-with-the-langtest-library","status":"publish","type":"post","link":"https:\/\/pacific.ai\/staging\/3667\/building-responsible-language-models-with-the-langtest-library\/","title":{"rendered":"Building Responsible Language Models with the LangTest Library"},"content":{"rendered":"<div id=\"bsf_rt_marker\"><\/div><p><em><a href=\"https:\/\/pacific.ai\/staging\/3667\/product\/\">Automatically generate test<\/a> cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package<\/em><\/p>\n<figure class=\"mb50 tac\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-80629 aligncenter\" src=\"https:\/\/www.johnsnowlabs.com\/wp-content\/uploads\/2023\/05\/3.png\" alt=\"\" width=\"1429\" height=\"804\" \/><\/figure>\n<p>If your goal is to deliver NLP systems for production systems, you are responsible for delivering models that are robust, safe, fair, unbiased, and private &#8211; in addition to being highly accurate. <a title=\"AI governance tool\" href=\"https:\/\/pacific.ai\/staging\/3667\/ai-policies\/\">This requires having the tools<\/a> &amp; processes to test for these requiremenst in practice &#8211; as part of your day-to-day work, your team\u2019s work, and on every new version of a model.<\/p>\n<p>The <a href=\"https:\/\/langtest.org\/\" target=\"_blank\" rel=\"noopnener noopener\"><strong>LangTest library<\/strong><\/a> is designed to help you do that, by providing comprehensive testing capabilities for both models and data. It allows you to easily generate, run, and customize tests to ensure your NLP systems are production-ready. With support for popular NLP libraries like transformers, Spark NLP, OpenAI, and spacy, LangTest is an extensible and flexible solution for any NLP project.<\/p>\n<p>In this article, we\u2019ll dive into three main tasks that the LangTest library helps you automate: Generating tests, running tests, and augmenting data.<\/p>\n<h2>Automatically Generate Tests<\/h2>\n<p>Unlike the testing libraries of the past, LangTest allows for the automatic generation of tests &#8211; to an extent. Each <code class=\"code_inline\">TestFactory<\/code> can specify multiple test types and implement a test case generator and runner for each one.<\/p>\n<p>The generated tests are presented as a table with &#8216;test case&#8217; and &#8216;expected result&#8217; columns that correspond to the specific test. These columns are designed to be easily understood by business analysts who can manually review, modify, add, or remove test cases as needed. For instance, consider the test cases generated by the <code class=\"code_inline\">RobustnessTestFactory<\/code> for an NER task on the phrase \u201cI live in Berlin.\u201d:<\/p>\n<table class=\"table1\">\n<thead>\n<tr>\n<th width=\"208\"><strong>Test type<\/strong><\/th>\n<th width=\"218\"><strong>Test case<\/strong><\/th>\n<th width=\"197\"><strong>Expected result<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td width=\"208\">remove_punctuation<\/td>\n<td width=\"218\">I live in Berlin<\/td>\n<td width=\"197\">Berlin: Location<\/td>\n<\/tr>\n<tr>\n<td width=\"208\">lowercase<\/td>\n<td width=\"218\">i live in berlin.<\/td>\n<td width=\"197\">berlin: Location<\/td>\n<\/tr>\n<tr>\n<td width=\"208\">add_typos<\/td>\n<td width=\"218\">I liive in Berlin.<\/td>\n<td width=\"197\">Berlin: Location<\/td>\n<\/tr>\n<tr>\n<td width=\"208\">add_context<\/td>\n<td width=\"218\">I live in Berlin. #citylife<\/td>\n<td width=\"197\">Berlin: Location<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Starting from the text &#8220;John Smith is responsible&#8221;, the <code class=\"code_inline\">BiasTestFactory<\/code> has generated test cases for a text classification task using US ethnicity-based name replacement.<\/p>\n<table class=\"table1\">\n<thead>\n<tr>\n<th width=\"222\"><strong>Test type<\/strong><\/th>\n<th width=\"247\"><strong>Test case<\/strong><\/th>\n<th width=\"155\"><strong>Expected result<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td width=\"222\">replace_to_asian_name<\/td>\n<td width=\"247\">Wang Li is responsible<\/td>\n<td width=\"155\">positive_sentiment<\/td>\n<\/tr>\n<tr>\n<td width=\"222\">replace_to_black_name<\/td>\n<td width=\"247\">Darnell Johnson is responsible<\/td>\n<td width=\"155\">negative_sentiment<\/td>\n<\/tr>\n<tr>\n<td width=\"222\">replace_to_native_american_name<\/td>\n<td width=\"247\">Dakota Begay is responsible<\/td>\n<td width=\"155\">neutral_sentiment<\/td>\n<\/tr>\n<tr>\n<td width=\"222\">replace_to_hispanic_name<\/td>\n<td width=\"247\">Juan Moreno is responsible<\/td>\n<td width=\"155\">negative_sentiment<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Generated by the <code class=\"code_inline\">FairnessTestFactory<\/code> and <code class=\"code_inline\">RepresentationTestFactory<\/code> classes, here are test cases that can ensure representation and fairness in the model&#8217;s evaluation. For instance, representation testing might require a test dataset with a minimum of 30 samples of male, female, and unspecified genders each. Meanwhile, fairness testing can set a minimum F1 score of 0.85 for the tested model when evaluated on data subsets with individuals from each of these gender categories.<\/p>\n<table class=\"table1\">\n<thead>\n<tr>\n<th width=\"190\"><strong>Test type<\/strong><\/th>\n<th width=\"226\"><strong>Test case<\/strong><\/th>\n<th width=\"208\"><strong>Expected result<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td width=\"190\">min_gender_representation<\/td>\n<td width=\"226\">Male<\/td>\n<td width=\"208\">30<\/td>\n<\/tr>\n<tr>\n<td width=\"190\">min_gender_representation<\/td>\n<td width=\"226\">Female<\/td>\n<td width=\"208\">30<\/td>\n<\/tr>\n<tr>\n<td width=\"190\">min_gender_representation<\/td>\n<td width=\"226\">Unknown<\/td>\n<td width=\"208\">30<\/td>\n<\/tr>\n<tr>\n<td width=\"190\">min_gender_f1_score<\/td>\n<td width=\"226\">Male<\/td>\n<td width=\"208\">0.85<\/td>\n<\/tr>\n<tr>\n<td width=\"190\">min_gender_f1_score<\/td>\n<td width=\"226\">Female<\/td>\n<td width=\"208\">0.85<\/td>\n<\/tr>\n<tr>\n<td width=\"190\">min_gender_f1_score<\/td>\n<td width=\"226\">Unknown<\/td>\n<td width=\"208\">0.85<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The following are important points to take note of regarding test cases:<\/p>\n<ul>\n<li>Each test type has its interpretation of \u201ctest case\u201d and \u201cexpected result,\u201d which should be human-readable. After calling h.generate(), it is possible to manually review the list of generated test cases and determine which ones to keep or modify.<\/li>\n<li>Given that the test table is a pandas data frame, it is editable within the notebook (with Qgrid) or exportable as a CSV file to allow business analysts to edit it in Excel.<\/li>\n<li>While automation handles 80% of the work, manual checks are necessary. For instance, a fake news detector&#8217;s test case may show a mismatch between the expected and actual prediction if it replaces &#8220;Paris is the Capital of France&#8221; with &#8220;Paris is the Capital of Sudan&#8221; using a <code class=\"code_inline\">replace_to_lower_income_country<\/code><\/li>\n<li>Tests must align with business requirements, and one must validate this. For instance, the <code class=\"code_inline\">FairnessTestFactory<\/code> does not test non-binary or other gender identities or mandate nearly equal accuracy across genders. However, the decisions made are clear, human-readable, and easy to modify.<\/li>\n<li>Test types may produce only one test case or hundreds of them, depending on the configuration. Each TestFactory defines a set of parameters.<\/li>\n<li>By design, TestFactory classes are usually task, language, locale, and domain-specific, enabling simpler and more modular test factories.<\/li>\n<\/ul>\n<h2>Running Tests<\/h2>\n<p>To use the test cases that have been generated and edited, follow these steps:<\/p>\n<ul>\n<li>Execute <code class=\"code_inline\">h.run()<\/code> to run all the tests. For each test case in the test harness&#8217;s table, the corresponding TestFactory will be called to execute the test and return a flag indicating whether the test passed or failed, along with a descriptive message.<\/li>\n<li>After calling <code class=\"code_inline\">h.run()<\/code>, call <code class=\"code_inline\">h.report()<\/code>. This function will group the pass ratio by test type, display a summary table of the results, and return a flag indicating whether the model passed the entire test suite.<\/li>\n<li>To store the test harness, including the test table, as a set of files, call <code class=\"code_inline\">h.save()<\/code>. This will enable you to load and run the same test suite later, for example, when conducting a regression test.<\/li>\n<\/ul>\n<p>Below is the example of a report generated for a Named Entity Recognition (NER) model, applying tests from five test factories:<\/p>\n<table class=\"table1\">\n<thead>\n<tr>\n<th width=\"113\"><strong>Category<\/strong><\/th>\n<th width=\"216\"><strong>Test type<\/strong><\/th>\n<th width=\"54\"><strong>Fail count<\/strong><\/th>\n<th width=\"56\"><strong>Pass count<\/strong><\/th>\n<th width=\"50\"><strong>Pass rate<\/strong><\/th>\n<th width=\"80\"><strong>Minimum pass rate<\/strong><\/th>\n<th width=\"55\"><strong>Pass?<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td width=\"113\">robustness<\/td>\n<td width=\"216\">remove_punctuation<\/td>\n<td width=\"54\">45<\/td>\n<td width=\"56\">252<\/td>\n<td width=\"50\">85%<\/td>\n<td width=\"80\">75%<\/td>\n<td width=\"55\">TRUE<\/td>\n<\/tr>\n<tr>\n<td width=\"113\">bias<\/td>\n<td width=\"216\">replace_to_asian_name<\/td>\n<td width=\"54\">110<\/td>\n<td width=\"56\">169<\/td>\n<td width=\"50\">65%<\/td>\n<td width=\"80\">80%<\/td>\n<td width=\"55\">FALSE<\/td>\n<\/tr>\n<tr>\n<td width=\"113\">representation<\/td>\n<td width=\"216\">min_gender_representation<\/td>\n<td width=\"54\">0<\/td>\n<td width=\"56\">3<\/td>\n<td width=\"50\">100%<\/td>\n<td width=\"80\">100%<\/td>\n<td width=\"55\">TRUE<\/td>\n<\/tr>\n<tr>\n<td width=\"113\">fairness<\/td>\n<td width=\"216\">min_gender_f1_score<\/td>\n<td width=\"54\">1<\/td>\n<td width=\"56\">2<\/td>\n<td width=\"50\">67%<\/td>\n<td width=\"80\">100%<\/td>\n<td width=\"55\">FALSE<\/td>\n<\/tr>\n<tr>\n<td width=\"113\">accuracy<\/td>\n<td width=\"216\">min_macro_f1_score<\/td>\n<td width=\"54\">0<\/td>\n<td width=\"56\">1<\/td>\n<td width=\"50\">100%<\/td>\n<td width=\"80\">100%<\/td>\n<td width=\"55\">TRUE<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>All the metrics calculated by LangTest, including the F1 score, bias score, and robustness score, are framed as tests with pass or fail outcomes. This approach requires you to specify the functionality of your application clearly, allowing for quicker and more confident model deployment. Furthermore, it enables you to share your test suite with regulators who can review or replicate your results.<\/p>\n<h2>Data Augmentation<\/h2>\n<p>A common approach to enhance the robustness or bias of your model is to include new training data that specifically targets these gaps. For instance, if the original dataset primarily consists of clean text without typos, slang, or grammatical errors, or doesn&#8217;t represent Muslim or Hindi names, adding such examples to the training dataset will help the model learn to handle them more effectively.<\/p>\n<p>Generating examples automatically to improve the model&#8217;s performance is possible using the same method that is used to generate tests. Here is the workflow for data augmentation:<\/p>\n<ol>\n<li>To automatically generate augmented training data based on the results from your tests, call <code class=\"code_inline\">h.augment()<\/code> after generating and running the tests. However, note that this dataset must be freshly generated, and the test suite cannot be used to retrain the model, as testing a model on data it was trained on would result in data leakage and artificially inflated test scores.<\/li>\n<li>You can review and edit the freshly generated augmented dataset as needed, and then utilize it to retrain or fine-tune your original model. It is available as a pandas dataframe.<\/li>\n<li>To evaluate the newly trained model on the same test suite it failed on before, create a new test harness and call <code class=\"code_inline\">h.load()<\/code> followed by <code class=\"code_inline\">h.run()<\/code> and <code class=\"code_inline\">h.report()<\/code>.<\/li>\n<\/ol>\n<p>By following this iterative process, NLP data scientists are able to improve their models while ensuring compliance with their ethical standards, corporate guidelines, and regulatory requirements.<\/p>\n<h2>Getting Started<\/h2>\n<p>Visit <a href=\"https:\/\/langtest.org\/\" target=\"_blank\" rel=\"noopnener noopener\">langtest.org<\/a> or run <code class=\"code_inline\">pip install LangTest<\/code> to get started with the LangTest library, which is freely available. Additionally, LangTest is an early stage open-source community project you are welcome to join.<\/p>\n<p>John Snow Labs has assigned a full development team to the project, and will continue to enhance the library for years, like our other open-source libraries. Regular releases with new test types, tasks, languages, and platforms are expected. However, contributing, sharing examples and documentation, or providing feedback will help you get what you need faster. Join the discussion on <a href=\"https:\/\/github.com\/JohnSnowLabs\/langtest\" target=\"_blank\" rel=\"noopnener noopener\">LangTest&#8217;s GitHub page<\/a>. Let&#8217;s work together to make safe, reliable, and responsible NLP a reality.<\/p>\n<h2>FAQ<\/h2>\n<p><strong>What is LangTest and how does it support model safety and fairness?<\/strong><\/p>\n<p>LangTest is an open-source Python toolkit that helps data scientists automatically generate and run over 60 types of tests\u2014covering accuracy, robustness, bias, representation, and fairness\u2014to ensure NLP and LLM models meet production readiness standards.<\/p>\n<p><strong>How does LangTest create test cases automatically?<\/strong><\/p>\n<p>Using configurable TestFactory classes (e.g. BiasTestFactory, RobustnessTestFactory), LangTest generates human-readable test tables\u2014like adding typos or demographic name swaps\u2014for analysts to review and adjust before execution.<\/p>\n<p><strong>Can LangTest help improve models beyond testing?<\/strong><\/p>\n<p>Yes. With the h.augment() method, LangTest can generate additional training data targeting identified weaknesses (e.g. underrepresented names or typo tolerance), which teams can manually review and add to retrain models.<\/p>\n<p><strong>How are test results presented and used in deployment decisions?<\/strong><\/p>\n<p>After running tests, LangTest produces a report summarizing pass rates by test category. Teams can save test suites and track performance over time\u2014automating governance checks in CI\/CD pipelines.<\/p>\n<p><strong>Who is the LangTest toolkit designed for?<\/strong><\/p>\n<p>It\u2019s intended for NLP and ML engineers, data scientists, and business analysts who need robust, transparent, and extensible testing of language models across frameworks like Spark NLP, Hugging Face, spaCy, and OpenAI APIs.<\/p>\n\n\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"FAQPage\",\n  \"mainEntity\": [\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What is LangTest and how does it support model safety and fairness?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"LangTest is an open-source Python toolkit that helps data scientists automatically generate and run over 60 types of tests\u2014covering accuracy, robustness, bias, representation, and fairness\u2014to ensure NLP and LLM models meet production readiness standards.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How does LangTest create test cases automatically?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Using configurable TestFactory classes (e.g. BiasTestFactory, RobustnessTestFactory), LangTest generates human-readable test tables\u2014like adding typos or demographic name swaps\u2014for analysts to review and adjust before execution.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Can LangTest help improve models beyond testing?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Yes. With the h.augment() method, LangTest can generate additional training data targeting identified weaknesses (e.g. underrepresented names or typo tolerance), which teams can manually review and add to retrain models.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How are test results presented and used in deployment decisions?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"After running tests, LangTest produces a report summarizing pass rates by test category. Teams can save test suites and track performance over time\u2014automating governance checks in CI\/CD pipelines.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Who is the LangTest toolkit designed for?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"It\u2019s intended for NLP and ML engineers, data scientists, and business analysts who need robust, transparent, and extensible testing of language models across frameworks like Spark NLP, Hugging Face, spaCy, and OpenAI APIs.\"\n      }\n    }\n  ]\n}\n<\/script>\n","protected":false},"excerpt":{"rendered":"<p>Automatically generate test cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package If your goal is to deliver NLP systems for production systems, you are responsible for delivering models that are robust, safe, fair, unbiased, and private &#8211; in addition to being highly accurate. This requires having the tools &amp; [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":334,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"nf_dc_page":"","content-type":"","inline_featured_image":false,"footnotes":""},"categories":[118],"tags":[],"class_list":["post-198","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-articles"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building Responsible Language Models with the LangTest Library - Pacific AI<\/title>\n<meta name=\"description\" content=\"Automatically generate test cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package\" \/>\n<meta name=\"robots\" content=\"noindex, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building Responsible Language Models with the LangTest Library - Pacific AI\" \/>\n<meta property=\"og:description\" content=\"Automatically generate test cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package\" \/>\n<meta property=\"og:url\" content=\"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/\" \/>\n<meta property=\"og:site_name\" content=\"Pacific AI\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/people\/Pacific-AI\/61566807347567\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-05T19:09:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-19T11:31:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/pacific.ai\/wp-content\/uploads\/2024\/12\/4-3.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"550\" \/>\n\t<meta property=\"og:image:height\" content=\"440\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"David Talby\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"David Talby\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/\"},\"author\":{\"name\":\"David Talby\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/person\\\/8a2b4d5d75c8752d83ae6bb1d44e0186\"},\"headline\":\"Building Responsible Language Models with the LangTest Library\",\"datePublished\":\"2024-11-05T19:09:54+00:00\",\"dateModified\":\"2026-02-19T11:31:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/\"},\"wordCount\":1496,\"publisher\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/4-3.webp\",\"articleSection\":[\"Articles\"],\"inLanguage\":\"en\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/\",\"url\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/\",\"name\":\"Building Responsible Language Models with the LangTest Library - Pacific AI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/4-3.webp\",\"datePublished\":\"2024-11-05T19:09:54+00:00\",\"dateModified\":\"2026-02-19T11:31:15+00:00\",\"description\":\"Automatically generate test cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/#breadcrumb\"},\"inLanguage\":\"en\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/#primaryimage\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/4-3.webp\",\"contentUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/4-3.webp\",\"width\":550,\"height\":440,\"caption\":\"Building responsible language models with the LangTest library, illustrating automated testing for bias, robustness, and safety in large language models to support trustworthy and governance-ready AI systems.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/building-responsible-language-models-with-the-langtest-library\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/pacific.ai\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building Responsible Language Models with the LangTest Library\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#website\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/\",\"name\":\"Pacific AI\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#organization\",\"name\":\"Pacific AI\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/site_logo.svg\",\"contentUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/site_logo.svg\",\"width\":182,\"height\":41,\"caption\":\"Pacific AI\"},\"image\":{\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/people\\\/Pacific-AI\\\/61566807347567\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/pacific-ai\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/#\\\/schema\\\/person\\\/8a2b4d5d75c8752d83ae6bb1d44e0186\",\"name\":\"David Talby\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/David_portret-96x96.webp\",\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/David_portret-96x96.webp\",\"contentUrl\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/David_portret-96x96.webp\",\"caption\":\"David Talby\"},\"description\":\"David Talby is a CTO at Pacific AI, helping healthcare &amp; life science companies put AI to good use. David is the creator of Spark NLP \u2013 the world\u2019s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams \u2013 in startups, for Microsoft\u2019s Bing in the US and Europe, and to scale Amazon\u2019s financial systems in Seattle and the UK. David holds a PhD in computer science and master\u2019s degrees in both computer science and business administration.\",\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/davidtalby\\\/\"],\"url\":\"https:\\\/\\\/pacific.ai\\\/staging\\\/3667\\\/author\\\/david\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building Responsible Language Models with the LangTest Library - Pacific AI","description":"Automatically generate test cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package","robots":{"index":"noindex","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"og_locale":"en_US","og_type":"article","og_title":"Building Responsible Language Models with the LangTest Library - Pacific AI","og_description":"Automatically generate test cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package","og_url":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/","og_site_name":"Pacific AI","article_publisher":"https:\/\/www.facebook.com\/people\/Pacific-AI\/61566807347567\/","article_published_time":"2024-11-05T19:09:54+00:00","article_modified_time":"2026-02-19T11:31:15+00:00","og_image":[{"width":550,"height":440,"url":"https:\/\/pacific.ai\/wp-content\/uploads\/2024\/12\/4-3.webp","type":"image\/webp"}],"author":"David Talby","twitter_card":"summary_large_image","twitter_misc":{"Written by":"David Talby","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/#article","isPartOf":{"@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/"},"author":{"name":"David Talby","@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/person\/8a2b4d5d75c8752d83ae6bb1d44e0186"},"headline":"Building Responsible Language Models with the LangTest Library","datePublished":"2024-11-05T19:09:54+00:00","dateModified":"2026-02-19T11:31:15+00:00","mainEntityOfPage":{"@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/"},"wordCount":1496,"publisher":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#organization"},"image":{"@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/#primaryimage"},"thumbnailUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/4-3.webp","articleSection":["Articles"],"inLanguage":"en"},{"@type":"WebPage","@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/","url":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/","name":"Building Responsible Language Models with the LangTest Library - Pacific AI","isPartOf":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#website"},"primaryImageOfPage":{"@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/#primaryimage"},"image":{"@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/#primaryimage"},"thumbnailUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/4-3.webp","datePublished":"2024-11-05T19:09:54+00:00","dateModified":"2026-02-19T11:31:15+00:00","description":"Automatically generate test cases, run tests, and augment training datasets with the open-source, easy-to-use, cross-library LangTest package","breadcrumb":{"@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/#breadcrumb"},"inLanguage":"en","potentialAction":[{"@type":"ReadAction","target":["https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/"]}]},{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/#primaryimage","url":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/4-3.webp","contentUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2024\/12\/4-3.webp","width":550,"height":440,"caption":"Building responsible language models with the LangTest library, illustrating automated testing for bias, robustness, and safety in large language models to support trustworthy and governance-ready AI systems."},{"@type":"BreadcrumbList","@id":"https:\/\/pacific.ai\/building-responsible-language-models-with-the-langtest-library\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/pacific.ai\/"},{"@type":"ListItem","position":2,"name":"Building Responsible Language Models with the LangTest Library"}]},{"@type":"WebSite","@id":"https:\/\/pacific.ai\/staging\/3667\/#website","url":"https:\/\/pacific.ai\/staging\/3667\/","name":"Pacific AI","description":"","publisher":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/pacific.ai\/staging\/3667\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en"},{"@type":"Organization","@id":"https:\/\/pacific.ai\/staging\/3667\/#organization","name":"Pacific AI","url":"https:\/\/pacific.ai\/staging\/3667\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/logo\/image\/","url":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/06\/site_logo.svg","contentUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/06\/site_logo.svg","width":182,"height":41,"caption":"Pacific AI"},"image":{"@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/people\/Pacific-AI\/61566807347567\/","https:\/\/www.linkedin.com\/company\/pacific-ai\/"]},{"@type":"Person","@id":"https:\/\/pacific.ai\/staging\/3667\/#\/schema\/person\/8a2b4d5d75c8752d83ae6bb1d44e0186","name":"David Talby","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/03\/David_portret-96x96.webp","url":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/03\/David_portret-96x96.webp","contentUrl":"https:\/\/pacific.ai\/staging\/3667\/wp-content\/uploads\/2025\/03\/David_portret-96x96.webp","caption":"David Talby"},"description":"David Talby is a CTO at Pacific AI, helping healthcare &amp; life science companies put AI to good use. David is the creator of Spark NLP \u2013 the world\u2019s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams \u2013 in startups, for Microsoft\u2019s Bing in the US and Europe, and to scale Amazon\u2019s financial systems in Seattle and the UK. David holds a PhD in computer science and master\u2019s degrees in both computer science and business administration.","sameAs":["https:\/\/www.linkedin.com\/in\/davidtalby\/"],"url":"https:\/\/pacific.ai\/staging\/3667\/author\/david\/"}]}},"_links":{"self":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts\/198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/comments?post=198"}],"version-history":[{"count":13,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts\/198\/revisions"}],"predecessor-version":[{"id":2058,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/posts\/198\/revisions\/2058"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/media\/334"}],"wp:attachment":[{"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/media?parent=198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/categories?post=198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pacific.ai\/staging\/3667\/wp-json\/wp\/v2\/tags?post=198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}