-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update evaluation tutorial #614
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
def evaluate_length(run: Run, example: Example) -> dict: | ||
prediction = run.outputs.get("output") or "" | ||
required = example.outputs.get("answer") or "" | ||
def evaluate_length(outputs: dict, reference_outputs: dict) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just call this something like is_concise()
and return the bool directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah going to call it "length" to match the key
# Note: If your system is async, you can use the asynchronous `aevaluate` function | ||
# import asyncio | ||
# from langsmith import aevaluate | ||
# | ||
# experiment_results = asyncio.run(aevaluate( | ||
# experiment_results = asyncio.run(client.aevaluate( | ||
# my_async_langsmith_app, # Your AI system | ||
# data=dataset_name, # The data to predict and grade over | ||
# evaluators=[evaluate_length, qa_evaluator], # The evaluators to score the results | ||
# evaluators=[length, correctness], # The evaluators to score the results | ||
# experiment_prefix="openai-3.5", # A prefix for your experiment names to easily identify them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we delete this comment? dont think its needed in tutorial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted, but would it be valuable to add a note about async compatibility somewhere? wouldn't want people turned away because they think we only do sync stuff.
Co-authored-by: Bagatur <[email protected]>
def concision(outputs: dict, reference_outputs: dict) -> bool: | ||
return int(len(outputs["response"]) < 2 * len(reference_outputs["answer"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The screenshots all say length so we would need to retake all of them if we want to use concision as the name.
No description provided.