Skip to content

Commit

Permalink
Lifecycle rules for s3 and gcp (#424)
Browse files Browse the repository at this point in the history
  • Loading branch information
akira authored Sep 9, 2024
2 parents 6b4d011 + c357f77 commit 0f6308e
Showing 1 changed file with 73 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ By default, LangSmith stores run inputs, outputs, and errors in ClickHouse. In a
- Currently, Azure Blob Storage is not supported (coming soon)
- A bucket/directory in your blob storage to store the data. We highly recommend creating a separate bucket/directory for LangSmith data.
- **If you are using TTLs**, you will need to set up a lifecycle policy to delete old data. You can find more information on configuring TTLs [here](/self_hosting/configuration/ttl). These policies should mirror the TTLs you have set in your LangSmith configuration, or you may experience data loss.
See [here](#ttl-configuration) on how to setup the lifecycle rules for TTLs for blob storage.
- Credentials to permit LangSmith Services to access the bucket/directory
- You will need to provide your LangSmith instance with the necessary credentials to access the bucket/directory. Read the authentication [section](#authentication) below for more information.
- An API url for your blob storage service
Expand Down Expand Up @@ -111,3 +112,75 @@ If using an access key and secret, you can also provide an existing Kubernetes s
This is recommended over providing the access key and secret key directly in your config.

:::

## TTL Configuration

If using the [TTL](/self_hosting/configuration/ttl) feature with LangSmith, you'll also have to configure TTL rules for
your blob storage. Trace information stored on blob storage is stored on a particular prefix path, which determines the TTL for the data.
When a trace's retention is extended, its corresponding blob storage path changes to ensure that it matches the new extended retention.

The following TTL prefix are used:

- `ttl_s/`: Short term TTL, configured for 14 days.
- `ttl_l/`: Long term TTL, configured for 400 days.

If you have customized the TTLs in your LangSmith configuration, you will need to adjust the TTLs in your blob storage configuration to match.

### Amazon S3

If using S3 for your blob storage, you will need to setup a filter lifecycle configuration that matches the
prefixes above. You can find information for this [in the Amazon Documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html#intro-lifecycle-rules-filter).

As an example, if you are using Terraform to manage your S3 bucket, you would setup something like this:

```hcl
rule {
id = "short-term-ttl"
prefix = "ttl_s/"
enabled = true
expiration {
days = 14
}
}
rule {
id = "long-term-ttl"
prefix = "ttl_l/"
enabled = true
expiration {
days = 400
}
}
```

### Google Cloud Storage

You will need to setup lifecycle conditions for your GCS buckets that you are using.
You can find information for this [in the Google Documentation](https://cloud.google.com/storage/docs/lifecycle#conditions),
specifically using matchesPrefix.

As an example, if you are using Terraform to manage your GCS bucket, you would setup something like this:

```hcl
lifecycle_rule {
condition {
age = 14
matches_prefix = ["ttl_s"]
}
action {
type = "Delete"
}
}
lifecycle_rule {
condition {
age = 400
matches_prefix = ["ttl_l"]
}
action {
type = "Delete"
}
}
```

0 comments on commit 0f6308e

Please sign in to comment.