Skip to content

Commit

Permalink
Merge pull request #102 from cvs-health/release-branch/v0.3.2
Browse files Browse the repository at this point in the history
Release PR: v0.3.2
  • Loading branch information
dylanbouchard authored Jan 15, 2025
2 parents da2ee94 + d0dad1a commit 956e6d3
Show file tree
Hide file tree
Showing 8 changed files with 83 additions and 11 deletions.
24 changes: 24 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Description
<!--- Provide a general summary of your changes. -->
<!--- Mention related issues, pull requests, or discussions with #<issue/PR/discussion ID>. -->
<!--- Tag people for whom this PR may be of interest using @<username>. -->

## Contributor License Agreement
<!--- Select all that apply by putting an x between the brackets: [x] -->
- [ ] confirm you have signed the [LangFair CLA](https://forms.office.com/pages/responsepage.aspx?id=uGG7-v46dU65NKR_eCuM1xbiih2MIwxBuRvO0D_wqVFUMlFIVFdYVFozN1BJVjVBRUdMUUY5UU9QRS4u&route=shorturl)

## Tests
<!--- Select all that apply by putting an x between the brackets: [x] -->
- [ ] no new tests required
- [ ] new tests added
- [ ] existing tests adjusted

## Documentation
<!--- Select all that apply by putting an x between the brackets: [x] -->
- [ ] no documentation changes needed
- [ ] README updated
- [ ] API docs added or updated
- [ ] example notebook added or updated

## Screenshots
<!--- If applicable, please add screenshots. -->
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ auto_object = AutoEval(
)
results = await auto_object.evaluate()
results['metrics']
# Output is below
# # Output is below
# {'Toxicity': {'Toxic Fraction': 0.0004,
# 'Expected Maximum Toxicity': 0.013845130120171235,
# 'Toxicity Probability': 0.01},
Expand Down Expand Up @@ -199,7 +199,7 @@ Bias and fairness metrics offered by LangFair are grouped into several categorie


## 📖 Associated Research
A technical description of LangFair's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/abs/2407.10853)**. If you use our framework for selecting evaluation metrics, we would appreciate citations to the following paper:
A technical description and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/abs/2407.10853)**. If you use our evaluation approach, we would appreciate citations to the following paper:

```bibtex
@misc{bouchard2024actionableframeworkassessingbias,
Expand All @@ -213,6 +213,20 @@ A technical description of LangFair's evaluation metrics and a practitioner's gu
}
```

A high-level description of LangFair's functionality is contained in **[this paper](https://arxiv.org/abs/2501.03112)**. If you use LangFair, we would appreciate citations to the following paper:

```bibtex
@misc{bouchard2025langfairpythonpackageassessing,
title={LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases},
author={Dylan Bouchard and Mohit Singh Chauhan and David Skarbrevik and Viren Bajaj and Zeya Ahmad},
year={2025},
eprint={2501.03112},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.03112},
}
```

## 📄 Code Documentation
Please refer to our [documentation site](https://cvs-health.github.io/langfair/) for more details on how to use LangFair.

Expand Down
3 changes: 1 addition & 2 deletions examples/evaluations/text_generation/auto_eval_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrate the implementation of `AutoEval` class. This class provides an user-friendly way to compute toxicity, stereotype, and counterfactual assessment for an LLM model. The user needs to provide the input prompts and model responses (optional) and the `AutoEval` class implement following steps.\n",
"This notebook demonstrate the implementation of `AutoEval` class. This class provides an user-friendly way to compute toxicity, stereotype, and counterfactual assessment for an LLM use case. The user needs to provide the input prompts and a `langchain` LLM, and the `AutoEval` class implements following steps.\n",
"\n",
"1. Check Fairness Through Awareness (FTU)\n",
"2. If FTU is not satisfied, generate dataset for Counterfactual assessment \n",
Expand Down Expand Up @@ -61,7 +61,6 @@
"outputs": [],
"source": [
"# User to populate .env file with API credentials\n",
"repo_path = '/'.join(os.getcwd().split('/')[:-3])\n",
"load_dotenv(find_dotenv())\n",
"\n",
"API_KEY = os.getenv('API_KEY')\n",
Expand Down
16 changes: 15 additions & 1 deletion langfair/generator/counterfactual.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,10 +334,18 @@ async def generate_responses(
----------
dict
A dictionary with two keys: 'data' and 'metadata'.
'data' : dict
A dictionary containing the prompts and responses.
'prompt' : list
A list of prompts.
'response' : list
A list of responses corresponding to the prompts.
'metadata' : dict
A dictionary containing metadata about the generation process.
'non_completion_rate' : float
The rate at which the generation process did not complete.
'temperature' : float
Expand Down Expand Up @@ -433,16 +441,22 @@ def check_ftu(
-------
dict
A dictionary with two keys: 'data' and 'metadata'.
'data' : dict
A dictionary containing the prompts and responses.
A dictionary containing the prompts and the attribute words they contain.
'prompt' : list
A list of prompts.
'attribute_words' : list
A list of attribute_words in each prompt.
'metadata' : dict
A dictionary containing metadata related to FTU.
'ftu_satisfied' : boolean
Boolean indicator of whether or not prompts satisfy FTU
'filtered_prompt_count' : int
The number of prompts that satisfy FTU.
"""
Expand Down
4 changes: 4 additions & 0 deletions langfair/generator/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,14 +197,18 @@ async def generate_responses(
-------
dict
A dictionary with two keys: 'data' and 'metadata'.
'data' : dict
A dictionary containing the prompts and responses.
'prompt' : list
A list of prompts.
'response' : list
A list of responses corresponding to the prompts.
'metadata' : dict
A dictionary containing metadata about the generation process.
'non_completion_rate' : float
The rate at which the generation process did not complete.
'temperature' : float
Expand Down
21 changes: 19 additions & 2 deletions langfair/metrics/classification/metrics/baseclass/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

from abc import ABC, abstractmethod
from typing import Optional
from typing import Optional,List

from numpy.typing import ArrayLike

Expand All @@ -38,7 +38,24 @@ def evaluate(
pass

@staticmethod
def binary_confusion_matrix(y_true, y_pred):
def binary_confusion_matrix(y_true, y_pred) -> List[List[float]]:
"""
Method for computing binary confusion matrix
Parameters
----------
y_true : Array-like
Binary labels (ground truth values)
y_pred : Array-like
Binary model predictions
Returns
-------
List[List[float]]
2x2 confusion matrix
"""
cm = [[0, 0], [0, 0]]
for i in range(len(y_pred)):
if y_pred[i] == y_true[i]:
Expand Down
6 changes: 3 additions & 3 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "langfair"
version = "0.3.1"
version = "0.3.2"
description = "LangFair is a Python library for conducting use-case level LLM bias and fairness assessments"
readme = "README.md"
authors = ["Dylan Bouchard <[email protected]>",
Expand Down

0 comments on commit 956e6d3

Please sign in to comment.