-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implementing STaR: Self Taught Reasoner #1478
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GitHoobar for the contribution! Left some comment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Overall looks great, left some comments below and created one enhance PR here: https://github.com/camel-ai/camel/pull/1514/files feel free to review and leave your comment!
camel/datagen/star/star_pipeline.py
Outdated
prompt = STaRTemplates.improvement_template.format( | ||
problem=problem, trace=trace, feedback=feedback | ||
) | ||
response = self.agent.step(prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve the conversation if you have updated
the refactor looks good, thanks @Wendong-Fan for the edits. |
# Initialize reward model (optional) | ||
|
||
# reward_model = NemotronRewardModel( | ||
# model_type=ModelType.NVIDIA_NEMOTRON_340B_REWARD, | ||
# url="https://integrate.api.nvidia.com/v1", | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean up unused code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be used to generate example @Wendong-Fan, update the env with suitable API in order to correctly use reward model.
camel/datagen/star/star_pipeline.py
Outdated
|
||
return evaluation.model_dump() | ||
|
||
def improve_trace(self, problem: str, trace: str, feedback: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original paper mentioned the term "rationalization," but I don't seem to see a similar implementation in this improvement method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. The original STaR (Self-Taught Reasoner) paper uses rationalization. But, Our current implementation is different because it's a test-time method that directly generates reasoning without having access to ground truth solutions.
Focusing more on the data gen part at the moment.
cc: @Wendong-Fan
json.dump(self.reasoning_traces, f, indent=2) | ||
|
||
# Templates for generating reasoning, evaluation and improving them. | ||
REASONING_TEMPLATE = """Let's solve this step by step: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When constructing the reasoning prompt, consider adding few-shot examples, as this can improve the performance to some extent. The original paper also adopts this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been taken care of
Thanks @GitHoobar @Wendong-Fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GitHoobar ! LGTM!
Co-authored-by: Rishabh <[email protected]>
Description
A basic implementation of STaR: Self Taught Reasoner
Motivation and Context
Why is this change required? What problem does it solve?
close #1411
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!