-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAI File Storage Metadata management #26
Comments
Requirements for OAI File Metadata ManagementDescriptionEnhance the metadata management of files uploaded to the OAI Platform to improve UX and prevent duplication. This includes adding checks for file uploads using file hashes and storing comprehensive metadata such as the original download URL. Acceptance Criteria
|
Inital plan for enhancing metadata management, handling duplicates, and improving the backend: High-Level Plan
|
The OAI Platform files UX is pretty lacking. https://platform.openai.com/storage/files
Right now it shows the following information in the UX:
![image](https://private-user-images.githubusercontent.com/8903067/332833904-780d7f88-ab5a-405f-bba3-019618244c1f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4Mzk1NTEsIm5iZiI6MTczODgzOTI1MSwicGF0aCI6Ii84OTAzMDY3LzMzMjgzMzkwNC03ODBkN2Y4OC1hYjVhLTQwNWYtYmJhMy0wMTk2MTgyNDRjMWYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDZUMTA1NDExWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MmZiMjRjMjFlYzhkYzg5M2FmYTRiMjUyZWE0ODdhNzVlY2I5ZjNlOWM0ZmI4NzBiYjNiZTQwODA4NDI2MDIzMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.7Thg-Ytgq9KLFmv6Ehy6_ARnPh669nJ3F1U0OSv8to8)
We would like a better interface to manage files between our local file systems, git repos, Chainlit front end, &c, and make sure we're not loading the same file in multiple places. A lot of times we get random file names. And most of the other endpoints such as vector stores and annotation file citations refer to the OAI file ID. We need a way to manage all this better.
The vector storage has a slightly better interface:
![image](https://private-user-images.githubusercontent.com/8903067/332835371-0e80cac8-921d-43a9-890a-4f4fc31eba06.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4Mzk1NTEsIm5iZiI6MTczODgzOTI1MSwicGF0aCI6Ii84OTAzMDY3LzMzMjgzNTM3MS0wZTgwY2FjOC05MjFkLTQzYTktODkwYS00ZjRmYzMxZWJhMDYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDZUMTA1NDExWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NDMzNTc2NGU2OTg0ZmIzY2RhNWRjNDFlZGIzNmNhYThhZDQ2MjcxOTA2OGVlMjYxYWQ3YmI0ZmRjMjNmZGZhMCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.ufxLefroMtVYOxeHasZNGaslTJUtwDYZ-xHzGoG4L4U)
It at least shows us the file names (most of the time) and now shows what assistants and threads a datastore is attached to.
I'm not sure I want to rebuild the OAI UX for all this, but we do need to do checks for file uploads to do file hashes, as well as some sort of summary or descriptive details about a file and why it was added. These can be used for rollups and the like.
We might also use some of this metadata for storing the original download URL or source, this is crucial when we start building ingestion pipelines for youtube videos and other datasources that aren't natively supported by retrieval.
The text was updated successfully, but these errors were encountered: