Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add Azure DB for PostgreSQL vector store #9326

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

lossyrob
Copy link
Contributor

Motivation and Context

This adds a VectorStore for Azure DB for PostgreSQL.

Currently, the only additional functionality this provides over the PostgresVectorStore is to easily connect to the database using Entra authentication. In the future, we will add capability to leverage the new DiskANN indexing.

Description

  • Leverage the ability to pass in a custom AsyncConnection class to psycopg_pool's AsyncConnectionPool. The custom AsyncEntraConnection class will identify if a azure.core.credentials TokenCredential or AsyncTokenCredential has been passed as a 'credential' connection argument, or supply the DefaultAzureCredentials if no user or password has been set. This will be used to set the username and password for the Entra user for each new connection the pool creates.
  • The AzureDBForPosgresSettings overrides methods from PostgresSettings to provide Entra authentication, allowing the user and password to not be set and generating the connection pool with AsyncEntraConnection
  • The tutorial was updated to allow the user to choose either AzureDBForPostgres or Postgres as the vector store.

TODO:

  • Unit tests
  • Integration tests

Contribution Checklist

@markwallace-microsoft markwallace-microsoft added python Pull requests for the Python Semantic Kernel memory labels Oct 18, 2024
@markwallace-microsoft
Copy link
Member

Python Unit Test Overview

Tests Skipped Failures Errors Time
2659 4 💤 0 ❌ 0 🔥 1m 22s ⏱️

@moonbox3
Copy link
Contributor

Hi @lossyrob, is this PR ready for a review?

@lossyrob lossyrob force-pushed the feature/azuredb-for-pg branch from c758b12 to 09477f1 Compare January 2, 2025 18:50
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Jan 2, 2025

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
semantic_kernel/connectors/memory/azure_db_for_postgres
   __init__.py440%3–11
   azure_db_for_postgres_collection.py13130%2–50
   azure_db_for_postgres_settings.py40400%2–99
   azure_db_for_postgres_store.py330%3–9
   constants.py330%3–6
   entra_connection.py48480%2–81
TOTAL16860188089% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
2995 4 💤 0 ❌ 0 🔥 1m 17s ⏱️

@lossyrob
Copy link
Contributor Author

lossyrob commented Jan 2, 2025

@moonbox3 sorry for the late reply, I'm just getting back from parental leave. This is ready for review, but I'm still working on testing. I don't see a good way to unit test this code, as it really only adds the ability to create a custom psycopg connection class that uses Entra auth, which is hard to mock and have substantial tests around. I need to create integration tests, which will be more substantial. This will require a PostgreSQL flex server that uses Entra authentication, along with a service principal available to the integration test runner to utilize during the integration tests. Is a service principal already available in that environment? Would it be possible to get a small Azure DB for PostgreSQL flex server with entra auth enabled and the proper IAM roles assigned in the integration test environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants