🌟 LangTest 1.8.0 Release by John Snow Labs

We're thrilled to unveil the latest advancements in LangTest with version 1.8.0. This release is centered around optimizing the codebase with extensive refactoring, enriching the debugging experience through the implementation of error codes, and enhancing workflow efficiency with streamlined task organization. The new categorization approach significantly improves the user experience, ensuring a more cohesive and organized testing process. This update also includes advancements in open source community standards, insightful blog posts, and multiple bug fixes, further solidifying LangTest's reputation as a versatile and user-friendly language testing and evaluation library.

🔥 Key Enhancements:

Optimized Codebase: This update features a comprehensively refined codebase, achieved through extensive refactoring, resulting in enhanced efficiency and reliability in our testing processes.
Advanced Debugging Tools: The introduction of error codes marks a significant enhancement in the debugging experience, addressing the previous absence of standardized exceptions. This inconsistency in error handling often led to challenges in issue identification and resolution. The integration of a unified set of standardized exceptions, tailored to specific error types and contexts, guarantees a more efficient and seamless troubleshooting process.
Task Categorization: This version introduces an improved task organization system, offering a more efficient and intuitive workflow. Previously, it featured a wide range of tests such as sensitivity, clinical tests, wino-bias and many more, each treated as separate tasks. This approach, while comprehensive, could result in a fragmented workflow. The new categorization method consolidates these tests into universally recognized NLP tasks, including Named Entity Recognition (NER), Text Classification, Question Answering, Summarization, Fill-Mask, Translation, and Test Generation. This integration of tests as sub-categories within these broader NLP tasks enhances clarity and reduces potential overlap.
Open Source Community Standards: With this release, we've strengthened community interactions by introducing issue templates, a code of conduct, and clear repository citation guidelines. The addition of GitHub badges enhances visibility and fosters a collaborative and organized community environment.
Parameter Standardization: Aiming to bring uniformity in dataset organization and naming, this feature addresses the variation in dataset structures within the repository. By standardizing key parameters like 'datasource', 'split', and 'subset', we ensure a consistent naming convention and organization across all datasets, enhancing clarity and efficiency in dataset usage.

🚀 Community Contributions:

Our team has published three enlightening blogs on Hugging Face's community platform, focusing on bias detection, model sensitivity, and data augmentation in NLP models:

⭐ Don't forget to give the project a star here!

🚀 New LangTest blogs :

New Blog Posts	Description
Evaluating Large Language Models on Gender-Occupational Stereotypes Using the Wino Bias Test	Delve into the evaluation of language models with LangTest on the WinoBias dataset, addressing AI biases in gender and occupational roles.
Streamlining ML Workflows: Integrating MLFlow Tracking with LangTest for Enhanced Model Evaluations	Discover the revolutionary approach to ML development through the integration of MLFlow and LangTest, enhancing transparency and systematic tracking of models.
Testing the Question Answering Capabilities of Large Language Models	Explore the complexities of evaluating Question Answering (QA) tasks using LangTest's diverse evaluation methods.
Evaluating Stereotype Bias with LangTest	In this blog post, we are focusing on using the StereoSet dataset to assess bias related to gender, profession, and race.

🐛 Bug Fixes

Fixed templatic augmentations PR #851
Resolved a bug in default configurations PR #880
Addressed compatibility issues between OpenAI (version 1.1.1) and Langchain PR #877
Fixed errors in sycophancy-test, factuality-test, and augmentation PR #869

What's Changed

Fix/templatic augmentations by @ArshaanNazir in #851
Refactor/report section by @ArshaanNazir in #860
Integrating error codes by @Prikshit7766 in #867
Refactor/delete dead code by @chakravarthik27 in #744
updated Evaluation_Metrics notebook by @Prikshit7766 in #861
fix rc errors by @ArshaanNazir in #868
Update issue templates by @RakshitKhajuria in #862
Created CODE_OF_CONDUCT.md by @RakshitKhajuria in #863
Refactor/add configurable parameters by @alytarik in #866
Added citation for the repo by @RakshitKhajuria in #871
resolved: errors in sycophancy-test, factuality-test and augmentation. by @chakravarthik27 in #869
Compatibility issue OpenAI (version 1.1.1) and Langchain by @Prikshit7766 in #877
Feature/task categorization by @chakravarthik27 in #878
Standardize qa dataset naming and structure by @Prikshit7766 in #876
Investigate TestFactory.task for Task Transition Errors by @Prikshit7766 in #873
updated wino evaluation by @RakshitKhajuria in #859
Chore/notebook updates by @ArshaanNazir in #879
Fix bug in default configs by @chakravarthik27 in #880
fix default config by @ArshaanNazir in #881
Fix: Update load_model method to accept a path instead in custom hub by @chakravarthik27 in #882
Website Updates by @RakshitKhajuria in #875
Release/1.8.0 by @ArshaanNazir in #883

Full Changelog: 1.7.0...v1.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

John Snow Labs LangTest 1.8.0: Codebase Refactoring, Enhanced Debugging with Error Codes, Streamlined Categorization of Tasks, Various Blogposts, Improved Open Source Community Standards and Enhanced User Experience through Multiple Bug Fixes !