John Snow Labs LangTest 1.8.0: Codebase Refactoring, Enhanced Debugging with Error Codes, Streamlined Categorization of Tasks, Various Blogposts, Improved Open Source Community Standards and Enhanced User Experience through Multiple Bug Fixes !
π LangTest 1.8.0 Release by John Snow Labs
We're thrilled to unveil the latest advancements in LangTest with version 1.8.0. This release is centered around optimizing the codebase with extensive refactoring, enriching the debugging experience through the implementation of error codes, and enhancing workflow efficiency with streamlined task organization. The new categorization approach significantly improves the user experience, ensuring a more cohesive and organized testing process. This update also includes advancements in open source community standards, insightful blog posts, and multiple bug fixes, further solidifying LangTest's reputation as a versatile and user-friendly language testing and evaluation library.
π₯ Key Enhancements:
-
Optimized Codebase: This update features a comprehensively refined codebase, achieved through extensive refactoring, resulting in enhanced efficiency and reliability in our testing processes.
-
Advanced Debugging Tools: The introduction of error codes marks a significant enhancement in the debugging experience, addressing the previous absence of standardized exceptions. This inconsistency in error handling often led to challenges in issue identification and resolution. The integration of a unified set of standardized exceptions, tailored to specific error types and contexts, guarantees a more efficient and seamless troubleshooting process.
-
Task Categorization: This version introduces an improved task organization system, offering a more efficient and intuitive workflow. Previously, it featured a wide range of tests such as sensitivity, clinical tests, wino-bias and many more, each treated as separate tasks. This approach, while comprehensive, could result in a fragmented workflow. The new categorization method consolidates these tests into universally recognized NLP tasks, including Named Entity Recognition (NER), Text Classification, Question Answering, Summarization, Fill-Mask, Translation, and Test Generation. This integration of tests as sub-categories within these broader NLP tasks enhances clarity and reduces potential overlap.
-
Open Source Community Standards: With this release, we've strengthened community interactions by introducing issue templates, a code of conduct, and clear repository citation guidelines. The addition of GitHub badges enhances visibility and fosters a collaborative and organized community environment.
-
Parameter Standardization: Aiming to bring uniformity in dataset organization and naming, this feature addresses the variation in dataset structures within the repository. By standardizing key parameters like 'datasource', 'split', and 'subset', we ensure a consistent naming convention and organization across all datasets, enhancing clarity and efficiency in dataset usage.
π Community Contributions:
Our team has published three enlightening blogs on Hugging Face's community platform, focusing on bias detection, model sensitivity, and data augmentation in NLP models:
- Detecting and Evaluating Sycophancy Bias: An Analysis of LLM and AI Solutions
- Unmasking Language Model Sensitivity in Negation and Toxicity Evaluations
- Elevate Your NLP Models with Automated Data Augmentation for Enhanced Performance
β Don't forget to give the project a star here!
π New LangTest blogs :
New Blog Posts | Description |
---|---|
Evaluating Large Language Models on Gender-Occupational Stereotypes Using the Wino Bias Test | Delve into the evaluation of language models with LangTest on the WinoBias dataset, addressing AI biases in gender and occupational roles. |
Streamlining ML Workflows: Integrating MLFlow Tracking with LangTest for Enhanced Model Evaluations | Discover the revolutionary approach to ML development through the integration of MLFlow and LangTest, enhancing transparency and systematic tracking of models. |
Testing the Question Answering Capabilities of Large Language Models | Explore the complexities of evaluating Question Answering (QA) tasks using LangTest's diverse evaluation methods. |
Evaluating Stereotype Bias with LangTest | In this blog post, we are focusing on using the StereoSet dataset to assess bias related to gender, profession, and race. |
π Bug Fixes
- Fixed templatic augmentations PR #851
- Resolved a bug in default configurations PR #880
- Addressed compatibility issues between OpenAI (version 1.1.1) and Langchain PR #877
- Fixed errors in sycophancy-test, factuality-test, and augmentation PR #869
What's Changed
- Fix/templatic augmentations by @ArshaanNazir in #851
- Refactor/report section by @ArshaanNazir in #860
- Integrating error codes by @Prikshit7766 in #867
- Refactor/delete dead code by @chakravarthik27 in #744
- updated Evaluation_Metrics notebook by @Prikshit7766 in #861
- fix rc errors by @ArshaanNazir in #868
- Update issue templates by @RakshitKhajuria in #862
- Created CODE_OF_CONDUCT.md by @RakshitKhajuria in #863
- Refactor/add configurable parameters by @alytarik in #866
- Added citation for the repo by @RakshitKhajuria in #871
- resolved: errors in sycophancy-test, factuality-test and augmentation. by @chakravarthik27 in #869
- Compatibility issue OpenAI (version 1.1.1) and Langchain by @Prikshit7766 in #877
- Feature/task categorization by @chakravarthik27 in #878
- Standardize qa dataset naming and structure by @Prikshit7766 in #876
- Investigate TestFactory.task for Task Transition Errors by @Prikshit7766 in #873
- updated wino evaluation by @RakshitKhajuria in #859
- Chore/notebook updates by @ArshaanNazir in #879
- Fix bug in default configs by @chakravarthik27 in #880
- fix default config by @ArshaanNazir in #881
- Fix: Update load_model method to accept a path instead in custom hub by @chakravarthik27 in #882
- Website Updates by @RakshitKhajuria in #875
- Release/1.8.0 by @ArshaanNazir in #883
Full Changelog: 1.7.0...v1.8.0