Current US legislation prohibits AI applications in recruiting, healthcare, and advertising from discrimination and bias.
This requires organizations who deploy such systems to test and prove that their solutions are robust and unbiased – in the same way that they’re required to comply with security and privacy regulations. This session introduces Pacific AI, a no-code tool built on top of the LangTest library, which applies Generative AI to:
- Automatically generate tests for accuracy, robustness, bias, and fairness for text classification and entity recognition tasks
- Automatically run test suite, create detailed model report cards, and compare different models against the same test suite
- Publish, share, and reuse AI test suites across teams and projects
- Automatically generate synthetic training data to augment model training and minimize common model bias and reliability issues
This session then presents how John Snow Labs uses Pacific AI to test and improve its own healthcare-specific language models.
FAQ
What can automated governance tools test in generative AI systems?
They can evaluate accuracy, robustness (e.g., typo tolerance), bias, and fairness for tasks like text classification and entity recognition using predefined or custom test suites.
How do tools generate test cases for bias and fairness automatically?
Generative AI generates synthetic variants (e.g., names, demographic profiles, adversarial prompts), enabling coverage of sensitive attributes like ethnicity or age for extensive bias testing.
Can you compare model versions using automated test suites?
Yes—these tools produce detailed report cards and support side-by-side model comparison on standardized test suites, tracking performance changes over time.
How is accuracy and robustness evaluated in non-technical terms?
Tests simulate noisy inputs (e.g., typos, paraphrasing) and assess if model outputs remain correct or consistent, providing pass/fail assessments for clarity.
What benefits does automated testing bring to domain experts?
Domain specialists can create, run, and share tests—without coding—ensuring models in sensitive fields (like healthcare, recruiting) are compliant with fairness, bias mitigation, and legal standards.
