Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Any plans to extend KnowledgeStorage for custom storage solutions? #1995

Open
ttinh opened this issue Jan 29, 2025 · 0 comments
Open
Labels
feature-request New feature or request

Comments

@ttinh
Copy link

ttinh commented Jan 29, 2025

Feature Area

Core functionality

Is your feature request related to a an existing bug? Please link it here.

Hello,

First off, awesome project! Love the work.

I'm trying to implement a custom KnowledgeStorage class that uploads my embeddings to Milvus instead of ChromaDB. After some digging it looks like a KnowledgeSource's storage property is disregarded when crew._knowledge is populated by Crew.create_crew_knowledge and Agent._set_knowledge.

class Knowledge(BaseModel):
    """
    Knowledge is a collection of sources and setup for the vector store to save and query relevant context.
    Args:
        sources: List[BaseKnowledgeSource] = Field(default_factory=list)
        storage: Optional[KnowledgeStorage] = Field(default=None)
        embedder_config: Optional[Dict[str, Any]] = None
    """

    sources: List[BaseKnowledgeSource] = Field(default_factory=list)
    model_config = ConfigDict(arbitrary_types_allowed=True)
    storage: Optional[KnowledgeStorage] = Field(default=None)
    embedder_config: Optional[Dict[str, Any]] = None
    collection_name: Optional[str] = None

    def __init__(
        self,
        collection_name: str,
        sources: List[BaseKnowledgeSource],
        embedder_config: Optional[Dict[str, Any]] = None,
        storage: Optional[KnowledgeStorage] = None,
        **data,
    ):
        super().__init__(**data)
        if storage:
            self.storage = storage
        else:
            self.storage = KnowledgeStorage(
                embedder_config=embedder_config, collection_name=collection_name
            )
        self.sources = sources
        self.storage.initialize_knowledge_storage()
        for source in sources:
            source.storage = self.storage
            source.add()

Are there plans to support custom KnowledgeStorage classes? Or am I missing something here?

Describe the solution you'd like

A way to specify a custom KnowledgeStorage class.

Describe alternatives you've considered

I tried writing a subclass of KnowledgeStorage, hoping I could pass it as a PDFKnowledgeSource's storage property.
implementation redacted for conciseness.

class MilvusKnowledgeStorage(KnowledgeStorage):
    
    ...

    def __init__(
        self,
        embedder_config: Optional[Dict[str, Any]] = None,
        collection_name: Optional[str] = None,
        db_name: Optional[str] = None,
        username: Optional[str] = None,
        password: Optional[str] = None,
        uri: Optional[str] = None
    ):
        # initialization for vars and embeddings client
        
    def emb_string(self, data):
        response = self.openai_client.embeddings.create(input=data, model=self.EMBEDDING_MODEL, dimensions=self.EMBEDDING_DIMS)
        return [doc.embedding for doc in response.data]

    def search(
        self,
        query: List[str],
        limit: int = 3,
        filter: Optional[dict] = None,
        score_threshold: float = 0.35,
    ) -> List[Dict[str, Any]]:
        # compute embeddings on query
        # milvus.search(...)
        # return results

    def initialize_knowledge_storage(self):
        milvus_client = pymilvus.MilvusClient(
            uri=self.uri,
            user=self.username,
            password=self.password,
            db_name=self.db_name
        )

        self.app = milvus_client
        # check for client ready and db_name, collection existence

    def reset(self):
        """Resets the milvus knowledge base by dropping and re-adding the collection.
        """
        if self.app.has_collection(self.collection_name):
            self.app.drop_collection(self.collection_name)
        
        self.app.create_collection(
            ...
        )
        
    def save(
        self,
        documents: List[str],
        metadata: Union[Dict[str, Any], List[Dict[str, Any]]],
    ) -> None:
        # logic for computing embeddings per document and upserting to Milvus

    def _set_embedder_config(
        self, embedder_config: Optional[Dict[str, Any]] = None
    ) -> None:
        """Set the embedding configuration for the knowledge storage.

        Args:
            embedder_config (Optional[Dict[str, Any]]): Configuration dictionary for the embedder.
                If None or empty, defaults to the default embedding function.
        """
        # set up openapi embeddings client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant