Skip to content

Commit

Permalink
Merge branch 'main' into brian/ls-1478-basic-auth-docs-addition
Browse files Browse the repository at this point in the history
  • Loading branch information
bvs-langchain authored Aug 8, 2024
2 parents 09653e0 + 6594c4a commit 4e18cc3
Show file tree
Hide file tree
Showing 7 changed files with 318 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
sidebar_position: 5
---

import {
CodeTabs,
PythonBlock,
TypeScriptBlock,
} from "@site/src/components/InstructionsWithCode";

# Dynamic few shot example selection

:::note
This feature is currently in closed beta. Please sign up [here](https://forms.gle/in9R6t9HNSYMBt7P7) for access
:::

Configure your datasets so that you can search for few shot examples based on an incoming request.

## Pre-conditions

1. Your dataset must have the KV store data type (we do not currently support chat model or LLM type datasets)
2. You must have an input schema defined for your dataset. See our docs on setting up schema validation [in our UI](./manage_datasets_in_application#dataset-schema-validation) for details.
3. You must be enabled for the closed beta
4. You must be on LangSmith cloud

## Index your dataset to be searched

On the datasets UI, click the `Few-shot Index` on the top right corner and hit `Start Sync`.

![](../static/few-shot-index.png)

This process will start to index your data to be searchable in the background. A note will appear on the
modal above that says if your index is up to date, and if not, what version of your dataset it last indexed.

All new data added to your dataset will automatically be indexed. You do not need to re-index when adding new data.

## Search your dataset for similar examples

You can search your dataset via API for similar examples using the `POST /datasets/<id>/search` REST API. Its documentation
can be found [here](https://api.smith.langchain.com/docs#/datasets/search_api_v1_datasets__dataset_id__search_post).
You can see detailed examples of how use this in your prompts both with and without LangChain in
[our cookbook on using indexed datasets with few shot prompts](https://github.com/langchain-ai/langsmith-cookbook/blob/beta-few-shot-search-cookbook/optimization/dataset-few-shot-search/FewShotDatasetsQuickstart.ipynb)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
---
sidebar_position: 13
---

import {
CodeTabs,
PythonBlock,
TypeScriptBlock,
} from "@site/src/components/InstructionsWithCode";
import { RegionalUrl } from "@site/src/components/RegionalUrls";

# Upload experiments run outside of LangSmith with the REST API

Some users prefer to manage their datasets and run their experiments outside of LangSmith, but want to use the LangSmith UI to view the results. This is supported via our `/datasets/upload-experiment` endpoint.

This guide will show you how to upload evals using the REST API, using the `requests` library in Python as an example. However, the same principles apply to any language.

## Request body schema

Uploading an experiment requires specifying the relevant high-level information for your experiment and dataset, along with the individual data for your examples and runs within
the experiment. Each object in the `results` represents a "row" in the experiment - a single dataset example, along with an associated run. Note that `dataset_id` and `dataset_name`
refer to your dataset identifier in your external system and will be used to group external experiments together in a single dataset. They should not refer to an existing dataset
in LangSmith (unless that dataset was created via this endpoint).

You may use the following schema to upload experiments to the `/datasets/upload-experiment` endpoint:

```json
{
"experiment_name": "string (required)",
"experiment_description": "string (optional)",
"experiment_start_time": "datetime (required)",
"experiment_end_time": "datetime (required)",
"dataset_id": "uuid (optional - an external dataset id, used to group experiments together)",
"dataset_name": "string (optional - must provide either dataset_id or dataset_name)",
"dataset_description": "string (optional)",
"experiment_metadata": { // Object (any shape - optional)
"key": "value"
},
"summary_experiment_scores": [ // List of summary feedback objects (optional)
{
"key": "string (required)",
"score": "number (optional)",
"value": "string (optional)",
"comment": "string (optional)",
"feedback_source": { // Object (optional)
"type": "string (required)"
},
"feedback_config": { // Object (optional)
"type": "string enum: continuous, categorical, or freeform",
"min": "number (optional)",
"max": "number (optional)",
"categories": [ // List of feedback category objects (optional)
"value": "number (required)",
"label": "string (optional)"
]
},
"created_at": "datetime (optional - defaults to now)",
"modified_at": "datetime (optional - defaults to now)",
"correction": "Object or string (optional)"
}
],
"results": [ // List of experiment row objects (required)
{
"row_id": "uuid (required)",
"inputs": { // Object (required - any shape). This will
"key": "val" // be the input to both the run and the dataset example.
},
"expected_outputs": { // Object (optional - any shape).
"key": "val" // These will be the outputs of the dataset examples.
},
"actual_outputs": { // Object (optional - any shape).
"key": "val // These will be the outputs of the runs.
},
"evaluation_scores": [ // List of feedback objects for the run (optional)
{
"key": "string (required)",
"score": "number (optional)",
"value": "string (optional)",
"comment": "string (optional)",
"feedback_source": { // Object (optional)
"type": "string (required)"
},
"feedback_config": { // Object (optional)
"type": "string enum: continuous, categorical, or freeform",
"min": "number (optional)",
"max": "number (optional)",
"categories": [ // List of feedback category objects (optional)
"value": "number (required)",
"label": "string (optional)"
]
},
"created_at": "datetime (optional - defaults to now)",
"modified_at": "datetime (optional - defaults to now)",
"correction": "Object or string (optional)"
}
],
"start_time": "datetime (required)", // The start/end times for the runs will be used to
"end_time": "datetime (required)", // calculate latency. They must all fall between the
"run_name": "string (optional)", // start and end times for the experiment.
"error": "string (optional)",
"run_metadata": { // Object (any shape - optional)
"key": "value"
}
}
]
}
```

The response JSON will be a dict with keys `experiment` and `dataset`, each of which is an object that contains relevant information about the experiment and dataset that was created.

## Considerations

You may upload multiple experiments to the same dataset by providing the same dataset_id or dataset_name between multiple calls. Your experiments will be grouped together
under a single dataset, and you will be able to [use the comparison view to compare results between experiments](./compare_experiment_results).

Ensure that the start and end times of your individual rows are all between the start and end time of your experiment.

You must provide either a dataset_id or a dataset_name. If you only provide an ID and the dataset does not yet exist, we will generate a name for you, and vice versa if
you only provide a name.

You may not upload experiments to a dataset that was not created via this endpoint. Uploading experiments is only supported for externally-managed datasets.

## Example request

Below is an example of a simple call to the `/datasets/upload-experiment`. This is a basic example that just uses the most important fields as an illustration.

```python
import os
import requests

body = {
"experiment_name": "My external experiment",
"experiment_description": "An experiment uploaded to LangSmith",
"dataset_name": "my-external-dataset",
"summary_experiment_scores": [
{
"key": "summary_accuracy",
"score": 0.9,
"comment": "Great job!"
}
],
"results": [
{
"row_id": "<<uuid>>",
"inputs": {
"input": "Hello, what is the weather in San Francisco today?"
},
"expected_outputs": {
"output": "Sorry, I am unable to provide information about the current weather."
},
"actual_outputs": {
"output": "The weather is partly cloudy with a high of 65."
},
"evaluation_scores": [
{
"key": "hallucination",
"score": 1,
"comment": "The chatbot made up the weather instead of identifying that "
"they don't have enough info to answer the question. This is "
"a hallucination."
}
],
"start_time": "2024-08-03T00:12:39",
"end_time": "2024-08-03T00:12:41",
"run_name": "Chatbot"
},
{
"row_id": "<<uuid>>",
"inputs": {
"input": "Hello, what is the square root of 49?"
},
"expected_outputs": {
"output": "The square root of 49 is 7."
},
"actual_outputs": {
"output": "7."
},
"evaluation_scores": [
{
"key": "hallucination",
"score": 0,
"comment": "The chatbot correctly identified the answer. This is not a "
"hallucination."
}
],
"start_time": "2024-08-03T00:12:40",
"end_time": "2024-08-03T00:12:42",
"run_name": "Chatbot"
}
],
"experiment_start_time": "2024-08-03T00:12:38",
"experiment_end_time": "2024-08-03T00:12:43"
}

resp = requests.post(
"https://api.smith.langchain.com/api/v1/datasets/upload-experiment",
json=body,
headers={"x-api-key": os.environ["LANGCHAIN_API_KEY"]}
)
print(resp.json())
```

Below is the response received:

```json
{
"dataset": {
"name": "my-external-dataset",
"description": null,
"created_at": "2024-08-03T00:36:23.289730+00:00",
"data_type": "kv",
"inputs_schema_definition": null,
"outputs_schema_definition": null,
"externally_managed": true,
"id": "<<uuid>>",
"tenant_id": "<<uuid>>",
"example_count": 0,
"session_count": 0,
"modified_at": "2024-08-03T00:36:23.289730+00:00",
"last_session_start_time": null
},
"experiment": {
"start_time": "2024-08-03T00:12:38",
"end_time": "2024-08-03T00:12:43+00:00",
"extra": null,
"name": "My external experiment",
"description": "An experiment uploaded to LangSmith",
"default_dataset_id": null,
"reference_dataset_id": "<<uuid>>",
"trace_tier": "longlived",
"id": "<<uuid>>",
"run_count": null,
"latency_p50": null,
"latency_p99": null,
"first_token_p50": null,
"first_token_p99": null,
"total_tokens": null,
"prompt_tokens": null,
"completion_tokens": null,
"total_cost": null,
"prompt_cost": null,
"completion_cost": null,
"tenant_id": "<<uuid>>",
"last_run_start_time": null,
"last_run_start_time_live": null,
"feedback_stats": null,
"session_feedback_stats": null,
"run_facets": null,
"error_rate": null,
"streaming_rate": null,
"test_run_number": 1
}
}
```

Note that the latency and feedback stats in the experiment results are null because the runs haven't had a chance to be persisted yet, which may take a few seconds.
If you save the experiment id and query again in a few seconds, you will see all the stats (although tokens/cost will still be null, because we don't ask for this
information in the request body).

## View the experiment in the UI

Now, login to the UI and click on your newly-created dataset! You should see a single experiment:
![Uploaded experiments table](./static/uploaded_dataset.png)

Your examples will have been uploaded:
![Uploaded examples](./static/uploaded_dataset_examples.png)

Clicking on your experiment will bring you to the comparison view:
![Uploaded experiment comparison view](./static/uploaded_experiment.png)

As you upload more experiments to your dataset, you will be able to compare the results and easily identify regressions in the comparison view.
5 changes: 5 additions & 0 deletions versioned_docs/version-2.0/how_to_guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,11 @@ Evaluate your LLM applications to measure their performance over time.
- [Create a dataset](./how_to_guides/evaluation/run_evals_api_only#create-a-dataset)
- [Run a single experiment](./how_to_guides/evaluation/run_evals_api_only#run-a-single-experiment)
- [Run a pairwise experiment](./how_to_guides/evaluation/run_evals_api_only#run-a-pairwise-experiment)
- [Upload experiments run outside of LangSmith with the REST API](./how_to_guides/evaluation/upload_existing_experiments)
- [Request body schema](./how_to_guides/evaluation/upload_existing_experiments#request-body-schema)
- [Considerations](./how_to_guides/evaluation/upload_existing_experiments#considerations)
- [Example request](./how_to_guides/evaluation/upload_existing_experiments#example-request)
- [View the experiment in the UI](./how_to_guides/evaluation/upload_existing_experiments#view-the-experiment-in-the-ui)

## Human feedback

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4e18cc3

Please sign in to comment.