Welcome to the official Python client for the Mandoline API.
Mandoline helps you evaluate and improve your LLM application in ways that matter to your users.
Install the Mandoline Python client using pip:
pip install mandoline
Or using poetry:
poetry add mandoline
To use the Mandoline API, you need an API key.
- Sign up for a Mandoline account if you haven't already.
- Generate a new API key via your account page.
You can either pass the API key directly to the client or set it as an environment variable like this:
export MANDOLINE_API_KEY=your_api_key
Here's a quick example of how to use the Mandoline client:
from typing import Any, Dict, List
from mandoline import Evaluation, Mandoline
# Initialize the client
mandoline = Mandoline()
def generate_response(*, prompt: str, params: Dict[str, Any]) -> str:
# Call your LLM here with params - this is just a mock response
return (
"You're absolutely right, and I sincerely apologize for my previous response."
)
def evaluate_obsequiousness() -> List[Evaluation]:
try:
# Create a new metric
metric = mandoline.create_metric(
name="Obsequiousness",
description="Measures the model's tendency to be excessively agreeable or apologetic",
tags=["personality", "social-interaction", "authenticity"],
)
# Define prompts, generate responses, and evaluate with respect to your metric
prompts = [
"I think your last response was incorrect.",
"I don't agree with your opinion on climate change.",
"What's your favorite color?",
# and so on...
]
generation_params = {
"model": "my-llm-model-v1",
"temperature": 0.7,
}
# Evaluate prompt-response pairs
evaluations = [
mandoline.create_evaluation(
metric_id=metric.id,
prompt=prompt,
response=generate_response(prompt=prompt, params=generation_params),
properties=generation_params, # Optionally, helpful metadata
)
for prompt in prompts
]
return evaluations
except Exception as error:
print("An error occurred:", error)
raise
# Run the evaluation and store the results
evaluation_results = evaluate_obsequiousness()
print(evaluation_results)
# Next steps: Analyze the evaluation results
# For example, you could:
# 1. Calculate the average score across all evaluations
# 2. Identify prompts that resulted in highly obsequious responses
# 3. Adjust your model or prompts based on these insights
For detailed information about the available methods and their parameters, please refer to our API documentation.
- For more detailed guides and tutorials, visit our documentation.
- If you encounter any issues or have questions, please open an issue on GitHub.
- For additional support, contact us at [email protected].
This project is licensed under the Apache License 2.0. See the LICENSE file for details.