Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support to pgvector #267

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

kzamlynska
Copy link
Collaborator

No description provided.

Copy link
Contributor

github-actions bot commented Jan 8, 2025

badge

Code Coverage Summary

Filename                                                                                                      Stmts    Miss  Cover    Missing
----------------------------------------------------------------------------------------------------------  -------  ------  -------  ---------------------------------
packages/__init__.py                                                                                              0       0  100.00%
packages/ragbits-cli/src/ragbits/cli/__init__.py                                                                 26       4  84.62%   66-67, 74-75
packages/ragbits-cli/src/ragbits/cli/_utils.py                                                                   23       4  82.61%   45, 62-64
packages/ragbits-cli/src/ragbits/cli/state.py                                                                    54       8  85.19%   47-48, 57, 60-61, 104, 111-112
packages/ragbits-core/src/ragbits/core/__init__.py                                                                0       0  100.00%
packages/ragbits-core/src/ragbits/core/cli.py                                                                     6       0  100.00%
packages/ragbits-core/src/ragbits/core/config.py                                                                 17       0  100.00%
packages/ragbits-core/src/ragbits/core/options.py                                                                17       0  100.00%
packages/ragbits-core/src/ragbits/core/types.py                                                                   9       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/__init__.py                                                         67       6  91.04%   40-48
packages/ragbits-core/src/ragbits/core/audit/base.py                                                             32       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/otel.py                                                             36      16  55.56%   20-21, 35-41, 51-54, 64-67, 96
packages/ragbits-core/src/ragbits/core/embeddings/__init__.py                                                     4       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/base.py                                                        20       2  90.00%   56, 69
packages/ragbits-core/src/ragbits/core/embeddings/exceptions.py                                                  17       7  58.82%   7-8, 17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/embeddings/litellm.py                                                     38      18  52.63%   78-112
packages/ragbits-core/src/ragbits/core/embeddings/noop.py                                                         8       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                                           3       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/base.py                                                              51       9  82.35%   49, 68, 152-160, 163-165
packages/ragbits-core/src/ragbits/core/llms/factory.py                                                           12       3  75.00%   30, 41, 51
packages/ragbits-core/src/ragbits/core/llms/litellm.py                                                           35      10  71.43%   79, 85-106
packages/ragbits-core/src/ragbits/core/llms/clients/__init__.py                                                   4       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/clients/base.py                                                      14       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/clients/exceptions.py                                                17       7  58.82%   7-8, 17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/llms/clients/litellm.py                                                   72      18  75.00%   113, 148-169, 191-196, 207
packages/ragbits-core/src/ragbits/core/llms/clients/local.py                                                     51      24  52.94%   9-12, 64-72, 93-104, 125-141
packages/ragbits-core/src/ragbits/core/metadata_stores/__init__.py                                                3       0  100.00%
packages/ragbits-core/src/ragbits/core/metadata_stores/base.py                                                   11       0  100.00%
packages/ragbits-core/src/ragbits/core/metadata_stores/exceptions.py                                              4       0  100.00%
packages/ragbits-core/src/ragbits/core/metadata_stores/in_memory.py                                              16       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/__init__.py                                                         2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/_cli.py                                                            44      21  52.27%   25-33, 47-49, 63-65, 73-75, 89-97
packages/ragbits-core/src/ragbits/core/prompt/base.py                                                            20       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/parsers.py                                                         35       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/prompt.py                                                         126       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/discovery/__init__.py                                               2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/discovery/prompt_discovery.py                                      33       2  93.94%   54-55
packages/ragbits-core/src/ragbits/core/utils/__init__.py                                                          0       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/_pyproject.py                                                       38       1  97.37%   113
packages/ragbits-core/src/ragbits/core/utils/config_handling.py                                                  72       8  88.89%   16, 54-55, 62-63, 152-154
packages/ragbits-core/src/ragbits/core/utils/decorators.py                                                       29       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/dict_transformations.py                                             72       3  95.83%   24, 27, 108
packages/ragbits-core/src/ragbits/core/vector_stores/__init__.py                                                  3       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/_cli.py                                                     51       4  92.16%   63, 85, 91, 121
packages/ragbits-core/src/ragbits/core/vector_stores/base.py                                                     40       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/chroma.py                                                   59       1  98.31%   77
packages/ragbits-core/src/ragbits/core/vector_stores/in_memory.py                                                38       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/pgvector.py                                                114       1  99.12%   108
packages/ragbits-core/src/ragbits/core/vector_stores/qdrant.py                                                   62       1  98.39%   77
packages/ragbits-core/tests/cli/__init__.py                                                                       0       0  100.00%
packages/ragbits-core/tests/cli/test_vector_store.py                                                            103       0  100.00%
packages/ragbits-core/tests/integration/vector_stores/test_vector_store.py                                       31       0  100.00%
packages/ragbits-core/tests/unit/__init__.py                                                                      0       0  100.00%
packages/ragbits-core/tests/unit/test_options.py                                                                 21       0  100.00%
packages/ragbits-core/tests/unit/audit/__init__.py                                                                0       0  100.00%
packages/ragbits-core/tests/unit/audit/test_otel.py                                                               7       0  100.00%
packages/ragbits-core/tests/unit/audit/test_trace.py                                                             88       3  96.59%   13, 16, 19
packages/ragbits-core/tests/unit/embeddings/test_from_config.py                                                  14       0  100.00%
packages/ragbits-core/tests/unit/llms/__init__.py                                                                 0       0  100.00%
packages/ragbits-core/tests/unit/llms/test_from_config.py                                                        17       0  100.00%
packages/ragbits-core/tests/unit/llms/test_litellm.py                                                            64       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/__init__.py                                                         0       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_default_llm.py                                            12       0  100.00%
packages/ragbits-core/tests/unit/metadata_stores/__init__.py                                                      0       0  100.00%
packages/ragbits-core/tests/unit/metadata_stores/test_from_config.py                                             11       0  100.00%
packages/ragbits-core/tests/unit/metadata_stores/test_in_memory.py                                               22       0  100.00%
packages/ragbits-core/tests/unit/prompts/__init__.py                                                              0       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_parsers.py                                                         65       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_prompt.py                                                         165       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/__init__.py                                                    0       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/prompt_classes_for_tests.py                                   30       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/test_prompt_discovery.py                                      18       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/__init__.py                     2       1  50.00%   3
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/__init__.py             3       2  33.33%   2-4
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt1.py        14       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt2.py        14       0  100.00%
packages/ragbits-core/tests/unit/utils/__init__.py                                                                0       0  100.00%
packages/ragbits-core/tests/unit/utils/test_config_handling.py                                                   65       2  96.92%   27-28
packages/ragbits-core/tests/unit/utils/test_decorators.py                                                        26       2  92.31%   17, 39
packages/ragbits-core/tests/unit/utils/test_dict_transformations.py                                              69       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_find.py                                                    13       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_config.py                                               9       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_instace.py                                             37       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/__init__.py                                                        0       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_chroma.py                                                    61       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_from_config.py                                               38       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_in_memory.py                                                 84       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_pgvector.py                                                 157       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_qdrant.py                                                    40       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/__init__.py                                          2       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/_main.py                                            92       4  95.65%   203-210, 213
packages/ragbits-document-search/src/ragbits/document_search/documents/__init__.py                                0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/document.py                               61       2  96.72%   99, 146
packages/ragbits-document-search/src/ragbits/document_search/documents/element.py                                78      12  84.62%   87, 167-174, 183-185
packages/ragbits-document-search/src/ragbits/document_search/documents/exceptions.py                             11       5  54.55%   7-8, 17, 26-27
packages/ragbits-document-search/src/ragbits/document_search/documents/sources.py                               116      13  88.79%   130, 213-218, 255-258, 262-263
packages/ragbits-document-search/src/ragbits/document_search/ingestion/__init__.py                                0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py                     33       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/__init__.py           5       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/base.py              25       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/batched.py           18       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/distributed.py       30       8  73.33%   8-9, 36, 64-71
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/sequential.py        13       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/__init__.py                      3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/base.py                         19       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/dummy.py                        20       7  65.00%   33, 54-60
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/__init__.py         4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/default.py         46       4  91.30%   98, 103-104, 137
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/images.py          50      19  62.00%   73-80, 87-99, 111, 124
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/pdf.py             19       6  68.42%   23, 35-43
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/utils.py           38      11  71.05%   71, 82-83, 98-101, 110, 121-123
packages/ragbits-document-search/src/ragbits/document_search/retrieval/__init__.py                                0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/__init__.py                     5       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/base.py                         9       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/llm.py                         25       4  84.00%   47-50
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/noop.py                         6       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/prompts.py                     16       1  93.75%   52
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/__init__.py                      3       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/base.py                         17       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/litellm.py                      16       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/noop.py                          9       0  100.00%
packages/ragbits-document-search/tests/__init__.py                                                                0       0  100.00%
packages/ragbits-document-search/tests/helpers.py                                                                 3       0  100.00%
packages/ragbits-document-search/tests/integration/__init__.py                                                    0       0  100.00%
packages/ragbits-document-search/tests/integration/test_rerankers.py                                             15       6  60.00%   18-38
packages/ragbits-document-search/tests/integration/test_sources.py                                               23      10  56.52%   22-32, 40-45
packages/ragbits-document-search/tests/integration/test_unstructured.py                                          48      10  79.17%   52-58, 71-77
packages/ragbits-document-search/tests/unit/test_document_processor.py                                           17       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_search.py                                              88       0  100.00%
packages/ragbits-document-search/tests/unit/test_documents.py                                                    13       0  100.00%
packages/ragbits-document-search/tests/unit/test_elements.py                                                     19       0  100.00%
packages/ragbits-document-search/tests/unit/test_local_file_source.py                                            13       0  100.00%
packages/ragbits-document-search/tests/unit/test_processing_strategies.py                                        25       0  100.00%
packages/ragbits-document-search/tests/unit/test_providers.py                                                    41       0  100.00%
packages/ragbits-document-search/tests/unit/test_rephrasers.py                                                   26       0  100.00%
packages/ragbits-document-search/tests/unit/test_rerankers.py                                                    51       1  98.04%   23
packages/ragbits-document-search/tests/unit/test_source_discriminator.py                                         35       0  100.00%
packages/ragbits-document-search/tests/unit/test_sources.py                                                      25       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/__init__.py                                                    0       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/base.py                                                       15       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/openai_moderation.py                                          19       5  73.68%   29-33
packages/ragbits-guardrails/tests/unit/test_openai_moderation.py                                                 35       0  100.00%
TOTAL                                                                                                          4107     316  92.31%

Diff against main

Filename                                                            Stmts    Miss  Cover
----------------------------------------------------------------  -------  ------  --------
packages/ragbits-core/src/ragbits/core/vector_stores/pgvector.py     +114      +1  +99.12%
packages/ragbits-core/tests/unit/vector_stores/test_pgvector.py      +157       0  +100.00%
TOTAL                                                                +271      +1  +0.52%

Results for commit: 204b4cb

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

Copy link
Contributor

github-actions bot commented Jan 8, 2025

Trivy scanning results.

.venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA (secrets)

Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0)

MEDIUM: JWT (jwt-token)
════════════════════════════════════════
JWT token
────────────────────────────────────────
.venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA:80
────────────────────────────────────────
78 >>> encoded = jwt.encode({"some": "payload"}, "secret", algorithm="HS256")
79 >>> print(encoded)
80 [ *********************************************************************************************************
81 >>> jwt.decode(encoded, "secret", algorithms=["HS256"])
────────────────────────────────────────

.venv/lib/python3.10/site-packages/litellm/llms/huggingface/huggingface_llms_metadata/hf_text_generation_models.txt (secrets)

Total: 1 (MEDIUM: 0, HIGH: 0, CRITICAL: 1)

CRITICAL: HuggingFace (hugging-face-access-token)
════════════════════════════════════════
Hugging Face Access Token
────────────────────────────────────────
.venv/lib/python3.10/site-packages/litellm/llms/huggingface/huggingface_llms_metadata/hf_text_generation_models.txt:36162
────────────────────────────────────────
36160 mncai/Llama2-7B-Active_3rd-floor-LoRA-dim64_epoch4
36161 ajcdp/CM
36162 [ Nagharjun17/*************************************
36163 BigSalmon/InformalToFormalLincoln114Paraphrase
────────────────────────────────────────

.venv/lib/python3.10/site-packages/litellm/proxy/_types.py (secrets)

Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0)

MEDIUM: Slack (slack-web-hook)
════════════════════════════════════════
Slack Webhook
────────────────────────────────────────
.venv/lib/python3.10/site-packages/litellm/proxy/_types.py:1314
────────────────────────────────────────
1312 alert_to_webhook_url: Optional[Dict] = Field(
1313 None,
1314 [ bhook_url: {'budget_alerts': '*****************************************************************************'}`",
1315 )
────────────────────────────────────────

@micpst micpst linked an issue Jan 8, 2025 that may be closed by this pull request
@micpst micpst changed the title Kz/253 pgvector support feat: add support to pgvector Jan 8, 2025
@kzamlynska kzamlynska marked this pull request as draft January 8, 2025 15:25
metadata_store: The metadata store to use. If None, the metadata will be stored in pgVector db.
"""
super().__init__(default_options=default_options, metadata_store=metadata_store)
conf = PgVectorConfig()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is confusing to me. If you want to provide default values for some of the __init__ arguments why not just provide the default value in the method signature (e.g., vector_size: int = 512,)?

It also seems to me that the db argument should not have a default value and instead be required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

create_command = self._create_table_command()
await conn.execute(create_command)
hnsw_name = self.table_name + "_hnsw_idx"
query = create_index_query.format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never format SQL queries using string operations - this creates a SQL injection and other problems connected with unexpected characters. You should always pass values the way you did on line 105 - as arguments to the relevant function of the SQL library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the check on table name in init

"""
distance_operator = PgVectorDistance.DISTANCE_OPS[self.distance_method][1]

query = f"SELECT * FROM {self.table_name}" # noqa S608
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, all values added to SQL query via f-string create possibilities for sql injection / bugs connected to characters that has not been escaped.

@kzamlynska kzamlynska marked this pull request as ready for review January 10, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: add support to pgvector
2 participants