-
Notifications
You must be signed in to change notification settings - Fork 336
Auth Tech Spec
๐ง This spec is a work in progress.
โ This spec describes the end goal of a system with both local authorization and third-party OAuth2 integration. Not all of this will be delivered in the first milestone. Milestones are defined here.
Phoenix has evolved from a notebook tool to an application backed by a database and deployed as an OCI container. Since building persistence into Phoenix, the most common ask from end users has been the ability to deploy Phoenix with authorization. While it's currently possible to secure an instance of Phoenix by deploying it behind a reverse-proxy and implementing custom authentication, this requires significant effort and expertise on the part of the user. Not only are spans, traces, and datasets potentially private, but certain planned features such as prompt playground require the storage of API keys. Building auth will allow users to easily and securely store sensitive data in deployed instances of Phoenix and will unlock development on a new set of features.
Our goal is to enable users to deploy authenticated instances of Phoenix in a straightforward and secure way.
Phoenix should be easy to use and should provide a great first-touch experience, including during setup and deployment. Deploying Phoenix is about as simple as possible, involving a single OCI container and an optional instance of Postgres. It should require minimal additional effort on the part of the user to deploy Phoenix with authorization.
We must also provide a secure solution that balances concerns for performance of the system as a whole.
Users should have the option but should not be required to use third-party authorization services with Phoenix. These services will be cost-prohibitive for some users.
It should be as simple as possible for users to re-establish security if their credentials are stolen or their database is exposed.
Persona | Use Case |
---|---|
Amy is a first-time user of Phoenix who is dogfooding notebooks in Colab. She does not have a specific use-case in mind, but expects the tools she uses to be straightforward and simple. | Amy is still being introduced to Phoenix as a product and has not yet learned that Phoenix is both a notebook tool and an application that can be self-hosted. She does not need any authentication while running in the notebook. |
Alex is a student and individual user of Phoenix working on a personal project. He has heard about Arize-Phoenix and its capabilities for experimentation and iteration. His project is in the early stages of development and runs entirely locally. | Alex is running Phoenix locally via the command line with python -m phoenix.server.main serve. He wants traces to be persisted to disk so that he can curate a dataset and iterate on his project over the course of several independent sessions. He neither wants nor expects Phoenix to come with any authentication. He appreciates how straightforward it is to use as a tool for local development. |
Betty works on a small team at a startup that has just deployed their first LLM application to production. They are early in their observability journey, but realize that they donโt have a way to incrementally improve the performance of their application, and so are investigating tooling for instrumentation, datasets, and experiments. They care mainly about ease of use and a good onboarding experiment. | Betty requires authentication since her traces contain private customer data, but her team doesnโt want to pay for a third-party OAuth2 provider such as Auth0. Her team uses Grafana, and she wants a similar first-touch experience that requires no additional integration or setup. Her team likes Phoenix from a product perspective, but they think itโs a hassle to set up a reverse proxy with their own auth. She doesnโt even want to bother with an SMTP relay for resetting passwords and is willing to manually edit a database if someone on her team forgets their password. She is used to working with API keys when dealing with LLM inference services and expects to be able to configure her instrumentation code similarly. |
Brian was a participant at a hackathon sponsored by Arize and LlamaIndex. He heard the pitch for LlamaTrace, gave it a try, and is now a user and evangelist for both Phoenix and LlamaIndex in his company. He is using hosted Phoenix for a project at work. | Brian wants to demonstrate the value of using LlamaIndex with Phoenix to his team by building an assistant for his companyโs proprietary data. He has followed the LlamaTrace documentation for configuring his instrumentation, and expects everything to just work. He does not even realize that the authentication in his case is handled by a LlamaTrace service that is separate from Phoenix itself. |
Cathy is a long-time user of Phoenix whose team has deployed a self-hosted Phoenix behind a reverse proxy with their own authentication. She is excited to hear that Phoenix now natively supports authentication. Her company already uses Auth0 throughout the organization. | Cathy doesnโt want just anyone with access to the domain where Phoenix is hosted to be able to create an account and sign in. Rather, she wants to use Auth0 to administrate who has access to Phoenix, so she has control over who can see traces and so that credentials for departing team members can easily be revoked. |
Carl works at a consulting firm helping enterprise companies adopt generative AI. His team is currently working with a client who is developing their first assistant. | Carl wants to deploy an instance of Phoenix to educate his client on observability and evaluation. He administrates his team of engineers in a third-party OAuth2 provider (e.g., Google Identity), but he also wants his clients to be able to create accounts and sign in with no additional help when provided with the link to the host instance of Phoenix. |
Diana is deeply familiar with Phoenix. She subscribes to release notes and has incorporated Phoenix deeply within her startup's infrastructure, going so far as to leverage undocumented REST and GraphQL APIs to build out bespoke workflows for datasets, experiments, and human annotations. The thing she values about Phoenix is its hackability. | Diana wants to be able to interact with Phoenix GraphQL and REST APIs in a secure manner. She not only wants API keys to be scoped to particular users, but she wants those keys to expire after a set amount of time. For things like annotations in particular, she wants to be able to attribute the source of the annotation to a particular user on her team. She uses GraphiQL and the Swagger UI for testing out requests as she is building. |
To make sure Phoenix is easy to deploy, users should not be required to configure third-party authorization services that introduce additional steps to the setup and deployment process. For this reason, Phoenix will offer a local authentication flow. Similarly, we should not require users to configure an SMTP server to perform basic operations such as resetting forgotten passwords.
To deploy Phoenix with local authentication, the user configures one additional environment variable:
-
PHOENIX_ENABLE_LOCAL_AUTH
: A boolean flag signaling whether to use local auth, defaulting to false. If true,PHOENIX_SECRET
must also be set. -
PHOENIX_SECRET
: A secret used to sign and validate tokens and API keys. We will provide a strong recommendation in our documentation to randomly generate the secret in a secure way (e.g.,openssl rand -base64 32
) and will require secrets to be of a certain length to discourage weak secrets.
When Phoenix spins up for the first time, the user will be shown a sign up form field with instructions to enter their username, email address, and password, and will be told that they will be the first admin user. Passwords less than some minimum number of characters in length will be rejected. After entering their credentials, they will be redirected to the home page.
Admin users will have access to a user administration page accessible via the left sidebar that will show all users in a table. This page will allow admins to:
- create new users
- view profiles for existing users, including username, email address, and role (admin vs. non-admin)
- modify profiles for existing users, including username, email address, role, and password
- delete existing users
Admins will be able to change their own role and the roles of other users, converting non-admins to admins and vice-versa. We will guarantee that there is always at least one admin by preventing actions that would remove the last admin.
Admins and non-admins alike will be able to view and change their own profile information associated with their own account, including their username, email address, and password via a profile page. When the user signs in via an OAuth2 integration.
Users will configure SMTP relay services via the following environment variables:
-
PHOENIX_ENABLE_SMTP
: boolean flag, whether to enable SMTP -
PHOENIX_SMTP_HOST
: host of SMTP server -
PHOENIX_SMTP_USER
: email address to authenticate -
PHOENIX_SMTP_PASSWORD
: email address to authenticate -
PHOENIX_SMTP_FROM_ADDRESS
: address used when sending out emails (defaults toPHOENIX_SMTP_USER
) -
PHOENIX_SMTP_FROM_NAME
: name used when sending emails (defaults to "Arize-Phoenix")
For example,
PHOENIX_ENABLE_SMTP=true
PHOENIX_SMTP_HOST=smtp.example.com
[email protected]
PHOENIX_SMTP_PASSWORD=supersecretpassword
[email protected]
PHOENIX_SMTP_FROM_NAME=Phoenix App
SMTP relays will just be for resetting passwords at first, but we may find additional use for them down the line.
When an SMTP relay is configured, users will be able to click on a reset password link on the login page, which will prompt the user to enter their username or email address. If the input is a valid username or email address, a password recovery email will be sent to the user containing:
- a button that links to reset page
- an anchor link with explicit instructions to copy paste
- a mention of how long the user has to reset their password
The link will have a short lifespan (e.g., 15 minutes) and will be usable only once.
When no SMTP relay is configured, users will be able to recover their passwords with the help of an admin by:
- requesting that an admin change their password to a temporary password
- logging in with the temporary password
- changing their password in the UI
Alternatively, instead of allowing admins to change user passwords directly, we can add a button by each user in the user administration table to allow admins to generate one-time password recovery links to send to users.
As a last resort, admins should be able to reset their own forgotten passwords by manually editing the database. This is mainly a documentation task. Utility functions for generating salts and computing password hashes will be documented. We will provide explicit instructions for an admin to:
- find their row in the
users
table and copy their salt - compute the hashed password with their salt and the new password of their choice
- update their hashed password in the
users
table
We will provide a guide to follow when admins suspect their database has been compromised. In this case, an admin should first reset the secret to a new, randomly generated value. Doing this will invalidate all previously issued API keys and all passwords. We will provide instructions and code snippets for how to compute password hashes given a salt and password. The admin can then compute and update their own password hash in the users
table to regain access to the UI and reset user passwords via the user administration page.
Phoenix's REST and GraphQL APIs must be accessible not only via a user session, but also programmatically. To facilitate this, Phoenix will issue two kinds of API keys, user and system API keys. As the name suggests, user API keys are associated with and act on behalf of the user to which they are issued. That user has the ability to view and delete their own user keys, and if the user is deleted, so are all of their associated user keys. A user might create their own user key into order to run an experiment in a notebook, for example.
System keys, in contrast, act on behalf of the system as a whole rather than any particular user. They can only be created by admins, are not meaningfully associated with the admin who creates them except for auditing purposes, and do not disappear if that admin is deleted. A system key would be the recommended kind of key to use in programmatic interactions with Phoenix that do not involve a user (e.g., automated flows querying our REST APIs).
General users will be able to manage their user API keys via an API key page, where they can:
- create new API keys with:
- name
- optional description
- optional lifespan
- view existing API keys:
- name
- description
- last four characters of the key (to help disambiguate between multiple keys)
- expiration date
- whether the key is still valid (keys can be expired or can be invalid because the
PHOENIX_SECRET
was changed)
- delete API keys
Admins will be able to view a separate page showing all system keys.
Requirements:
- Users will be able to copy API keys upon creation, but the full key should not be visible after that.
- Users should be able to delete individual keys without affecting other keys.
- Invalidating the
PHOENIX_SECRET
should invalidate all previously issued API keys. This provides a simple mechanism to secure the API in the event of an attack.
Users will need to set their API key anytime they interact programmatically with Phoenix APIs.
Users will be able to set their API key using a PHOENIX_API_KEY
environment variable when using phoenix.Client
and our experiments API. We may wish to add an api_key
parameter to phoenix.Client
, but there is not currently a user-facing client for the experiments API, so we may wish to rely purely on environment variables for the sake of consistency.
Raw requests to Phoenix REST or GraphQL APIs will need to attach the API key as a header (e.g., X-API-Key
).
The user stories and user flows above show how the REST and GraphQL APIs both require session-based and API key-based auth:
Session-Based | Key-Based | |
---|---|---|
GraphQL | using the UI | programmatic access to GraphQL API |
REST | using the Swagger UI | using phoenix.Client or sending raw requests |
Access to these routes and resolvers will be controlled by fastapi
dependencies and middleware. If we introduce scoped access at a later time (i.e., where users can access only a subset of our APIs), we can use strawberry
permissions to provide more granular access to specific GraphQL resolvers.
Some non-sensitive routes will be left unsecured (e.g., /healthz
, /arize-phoenix-version
, static assets, etc.). We will add public POST
routes for login and logout, e.g., /login
and /logout
. Since users need to be authenticated before they access our single-page React app, we'll need to serve a login form (i.e., templated HTML), e.g., on GET /login
.
๐ง This section is a work in progress.
๐ง Recommendation: Use a single salt, PHOENIX_SECRET
We will use so-called password "salting" to securely verify user credentials. The purpose of salting is to avoid storing passwords in plaintext and to mitigate against attacks using pre-computed hash tables. Salting passwords involves appending a random string of characters (called a "salt") to each password before hashing it. During verification, the system appends the salt to the input password, hashes the combination, and compares the result to the stored hash to see if they match. There are different strategies for choosing a salt. Some applications use a single salt via an environment variable (in our case, we would simply use PHOENIX_SECRET
). The main advantage of this approach is it provides a simple lever to pull to invalidate all passwords in the case of an attack (the same lever to pull to invalidate all API keys).
Another strategy that is generally considered more secure is for the application to randomly generate salts on a per-user basis. This has a few advantages:
- it increases the cost of brute-force attacks since hashes need to be computed not just for one salt, but for many salts
- it ensures that every stored hash is unique even if multiple users have the same password, so a bad actor with database access cannot deduce that two users have the same password
In isolation, unique salts per user would be the way to go since they are no more complex to implement and are more resilient to brute-force attacks. However, this consideration is more important in the usual context where the authentication and resource databases are separate. In our case, they are one and the same, so if someone has compromised the users
table containing hashed passwords and is running a brute-force attack over it, they've probably already compromised the tables containing sensitive data.
It would be possible to use a combination of both salting strategies, using both a system-wide salt and user-specific salts to gain the benefits of both. The benefit of doing this probably does not warrant the additional complexity.
๐งฎ Recommendation: Compute password hashes with PBKDF2 with SHA256 or scrypt (i.e., do as Django does)
Password hashes will be computed using a cryptographic hash function. The ideal hash function is slow enough to be resilient against brute-force attacks, but not so slow that it makes logging in noticably laggy. An algorithm such as SHA256 is generally considered too fast and susceptible to brute-force attacks by itself. The default algorithm used by Django is PBKDF2 with SHA256, which runs SHA256 several hundred thousand times and is available in the Python standard library. A more recent alternative is scrypt, a crypographic hash function that is tunable in terms of its computation time and memory usage and is also available in the Python standard library.
The following algorithms are supported by Django:
Cryptographic Hash Function | Adjustable Time | Adjustable Memory | In Python Standard Library |
---|---|---|---|
PBKDF2 with SHA256 | โ | โ | โ |
bcrypt (used by Auth0) | โ | โ | โ |
scrypt | โ | โ | โ |
argon2 | โ | โ | โ |
For more information, see:
๐ Recommendation: Use JWTs for API keys rather than opaque strings
API keys can be opaque strings (e.g., randomly generated hashes) or can have self-encoded content (e.g., JWTs).
JWTs (JSON Web Token), and in particular, JWSs (JSON Web Signature), are tokens that contain:
- a base64-encoded JSON header containing metadata
- a base64-encoded JSON payload containing data
- a signature, the hash of the encoded header and payload plus a secret
A server (e.g., Phoenix) can issue a JWT signed with a secret (e.g., PHOENIX_SECRET
) to a client. When the server receives a JWT from a client, it can compute a hash to determine whether the JWT was signed using its own secret and whether the payload and header have been changed. This means that the server can know whether to trust the data in a token simply by inspecting the token itself without maintaining state about the tokens it has issued.
The JWT payload for a Phoenix API key might look like:
{
"sub": 19,
"iat": 1516239022,
"exp": 1516242622
}
where "sub", "iat", and "exp" are so-called "registered claims" standing for "subject", "issued at", and "expires at", respectively. In this case, "sub" represents the ID of the token in the database.
The main advantage of JWTs over opaque strings is that they allow several kinds of information to be checked without querying a database.
- Unlike opaque strings, which would require storing salted hashes in a database, JWTs can be validated without consulting a database.
- Expired tokens can rejected simply by inspecting the token itself rather than retrieving an expiry from the database.
- In the future, we may add "scopes" to our API keys, so that certain scopes are required to access certain APIs (e.g., "read" vs. "write" scopes). If we use JWTs with a "scope" claim in the payload, server middleware can check whether an API key can use an API by simply inspecting the JWT.
For more info, see:
Even if we use JWTs for our API keys, we will still need to maintain state on all issued tokens so that users can view details of their previously issued keys (name, description, expiration, validity).
When validating an API key, checking that the API key has not expired does not guarantee that the key is still valid, since it is possible that a user or admin revoked the key in the UI. See the section on caching below.
The password recovery URL sent via a configured SMTP relay will have as a query parameter a JWT containing the user ID and a short lifespan (e.g., 15 minutes). Its payload might look like:
{
"sub": 19,
"iat": 1516239022,
"exp": 1516242622
}
In this case, "sub" refers to the database ID of the user. Once again, JWTs are preferred to randomly generated hashes because they enable expired and invalid tokens to be rejected without querying the database.
๐ง This section is in progress.
๐ง This section is in progress.
We will add several new database tables.
- A
user_roles
table that will initially define three roles:- system: the role take by the system user described below
- admin
- general: non-admin users
- A
users
table with a foreign key relation to theuser_roles
table- The
users
table will come pre-populated with a system user, the only user in the table granted the system role, who is the user associated with system API keys. - There may also be an initial admin user with a default password (if we wish to support this first-touch flow).
- The
- An
api_keys
table containing both kinds of API keys (system and user). This table includes information such as name, description, and expiry that will be displayed in the UI. - Separate tables for access, refresh, and password reset tokens. The motivation for using separate tables here is to be able to apply different retention policies on each. The purpose of these tables is to track whether individual tokens have been used, since they must be single-use only.
To avoid a long-lived feature branch and to give ourselves the flexibility to dogfood the table structure, we can try having a PHOENIX_DANGEROUSLY_ENABLE_EXPERIMENTAL_AUTH
setting (a precursor to PHOENIX_ENABLE_AUTH
) that would run the migration containing the new auth-related tables (we would configure this migration to not run otherwise).