Skip to content

Commit

Permalink
cr
Browse files Browse the repository at this point in the history
  • Loading branch information
bracesproul committed Apr 15, 2024
1 parent 4d74d6d commit b39d046
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 20 deletions.
36 changes: 17 additions & 19 deletions docs/evaluation/faq/evaluator-implementations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,10 @@ Three QA evaluators you can load are: `"qa"`, `"context_qa"`, `"cot_qa"`. Based
- The `"qa"` evaluator ([reference](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.QAEvalChain.html#langchain-evaluation-qa-eval-chain-qaevalchain)) instructs an llm to directly grade a response as "correct" or "incorrect" based on the reference answer.
- The `"context_qa"` evaluator ([reference](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.ContextQAEvalChain.html#langchain.evaluation.qa.eval_chain.ContextQAEvalChain)) instructs the LLM chain to use reference "context" (provided throught the example outputs) in determining correctness. This is useful if you have a larger corpus of grounding docs but don't have ground truth answers to a query.
- The `"cot_qa"` evaluator ([reference](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.CotQAEvalChain.html#langchain.evaluation.qa.eval_chain.CotQAEvalChain)) is similar to the "context_qa" evaluator, except it instructs the LLMChain to use chain of thought "reasoning" before determining a final verdict. This tends to lead to responses that better correlate with human labels, for a slightly higher token and runtime cost.
{" "}
{" "}
<CodeTabs
tabs={[
PythonBlock(`from langsmith import Client

<CodeTabs
tabs={[
PythonBlock(`from langsmith import Client
from langsmith.evaluation import LangChainStringEvaluator, evaluate\n
qa_evaluator = LangChainStringEvaluator("qa")
context_qa_evaluator = LangChainStringEvaluator("context_qa")
Expand All @@ -69,17 +68,16 @@ evaluate(
evaluators=[qa_evaluator, context_qa_evaluator, cot_qa_evaluator],
metadata={"revision_id": "the version of your pipeline you are testing"},
)`),
]}
groupId="client-language"
/>
You can customize the evaluator by specifying the LLM used to power its LLM
chain or even by customizing the prompt itself. Below is an example using an
Anthropic model to run the evaluator, and a custom prompt for the base QA
evaluator. Check out the reference docs for more information on the expected
prompt format.
<CodeTabs
tabs={[
PythonBlock(`from langchain.chat_models import ChatAnthropic
]}
groupId="client-language"
/>
You can customize the evaluator by specifying the LLM used to power its LLM chain
or even by customizing the prompt itself. Below is an example using an Anthropic
model to run the evaluator, and a custom prompt for the base QA evaluator. Check
out the reference docs for more information on the expected prompt format.
<CodeTabs
tabs={[
PythonBlock(`from langchain.chat_models import ChatAnthropic
from langchain_core.prompts.prompt import PromptTemplate
from langsmith.evaluation import LangChainStringEvaluator\n
_PROMPT_TEMPLATE = """You are an expert professor specialized in grading students' answers to questions.
Expand All @@ -105,9 +103,9 @@ evaluate(
evaluators=[qa_evaluator, context_qa_evaluator, cot_qa_evaluator],
)
`),
]}
groupId="client-language"
/>
]}
groupId="client-language"
/>

## Criteria Evaluators (No Labels)

Expand Down
3 changes: 2 additions & 1 deletion docs/tracing/faq/logging_and_viewing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ Additionally, you will need to set `LANGCHAIN_TRACING_V2='true'` if you plan to

- LangChain (Python or JS)
- `@traceable` decorator or `wrap_openai` method in the Python SDK
:::

:::

<CodeTabs
tabs={[
Expand Down

0 comments on commit b39d046

Please sign in to comment.