-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Hybrid Retrieval #1398
base: master
Are you sure you want to change the base?
feat: Hybrid Retrieval #1398
Conversation
camel/retrievers/hybrid_retrival.py
Outdated
from camel.types import EmbeddingModelType, StorageType | ||
|
||
|
||
class HybridRetriever: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Maybe extending BaseRetriever
here would help maintain consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous thought was that HybridRetriever
and AutoRetrival
were similar classes, not the base retrival component. I noticed that AutoRetrival
also does not inherit BaseRetrival
, so I'm not sure if they should both inherit BaseRetrival
, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out, what do you think @Wendong-Fan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @liuxukun2000 and @yiyiyi0817! The BaseRetriever
is a minimal, abstract base class, while the HybridRetriever
extends both VectorRetriever
and BM25Retriever
, operating at a higher level. However, we can still inherit from BaseRetriever
to implement the process
method.
I think it would be better not to include AutoRetriever
directly within HybridRetriever
. AutoRetriever
is a simple implementation designed to allow users to quickly run our RAG pipeline. Its primary purpose is to provide an easy entry point for users to try our RAG functionality, so it should remain at the top level for user interaction. HybridRetriever
should depend solely on VectorRetriever
and BM25Retriever
. Perhaps later, we can consider integrating HybridRetriever
into AutoRetriever
for an enhanced user experience. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WenDong OK, I agree with you.
camel/retrievers/hybrid_retrival.py
Outdated
contents=self.content_input_path, | ||
top_k=vector_retriever_top_k, | ||
similarity_threshold=vector_retriever_similarity_threshold, | ||
return_detailed_info=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be return_detailed_info=return_detailed_info
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your message. I think the setting of return_detailed_info=True
here is intended to obtain detailed results from auto_retriever
, and this is fixed. The return_detailed_info
parameter of the query
function is used to specify whether to finally return detialed results that include the rrf score
. However, this part may be refactored later to become vector_retriever
.
return assistant_response.msg.content | ||
|
||
|
||
print(single_agent("What is it like to be a visiting student at KAUST?")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to include the expected output for single_agent("What is it like to be a visiting student at KAUST?")
at the end of the file, consistent with other examples.
camel/retrievers/hybrid_retrival.py
Outdated
"Original Query": query, | ||
"Retrieved Context": text_retrieved_info, | ||
} | ||
if return_detailed_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in this section can be simplified to avoid redundancy.
if return_detailed_info: | |
retrieved_info = { | |
"Original Query": query, | |
"Retrieved Context": all_retrieved_info if return_detailed_info else [item['text'] for item in all_retrieved_info], | |
} | |
return retrieved_info |
camel/retrievers/hybrid_retrival.py
Outdated
from camel.types import EmbeddingModelType, StorageType | ||
|
||
|
||
class HybridRetriever: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out, what do you think @Wendong-Fan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. And I will recommit a new version soon.
camel/retrievers/hybrid_retrival.py
Outdated
from camel.types import EmbeddingModelType, StorageType | ||
|
||
|
||
class HybridRetriever: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WenDong OK, I agree with you.
Description
Hybrid Retrieval that combines auto retrieval and BM25 retrieval.
Motivation and Context
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:More Tasks
In the future, improvements can be made by separating the chunking and processing parts of the original vector-based retrieval and BM25 retrieval code, so that chunks can be uniformly numbered instead of relying on the current version's string matching deduplication operation.
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!