Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: STaR Integration #1514

Merged
merged 13 commits into from
Jan 30, 2025
Merged

enhance: STaR Integration #1514

merged 13 commits into from
Jan 30, 2025

Conversation

Wendong-Fan
Copy link
Member

Description

enhancement based on review comment: #1478 (review)

Motivation and Context

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

  • Subtask 1
  • Subtask 2
  • Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Wendong-Fan Wendong-Fan requested a review from GitHoobar January 26, 2025 23:36
@Wendong-Fan Wendong-Fan marked this pull request as ready for review January 26, 2025 23:44
@Wendong-Fan Wendong-Fan self-assigned this Jan 26, 2025
@Wendong-Fan Wendong-Fan added the enhancement New feature or request label Jan 26, 2025
@Wendong-Fan Wendong-Fan changed the title Refactor: STaR Integration refactor: STaR Integration Jan 26, 2025
@Wendong-Fan Wendong-Fan changed the title refactor: STaR Integration enhance: STaR Integration Jan 26, 2025
@GitHoobar
Copy link
Collaborator

can we further add a method to validate problem format?

@ZIYU-DEEP
Copy link
Contributor

ZIYU-DEEP commented Jan 27, 2025

Thanks a lot for this PR! i just took a quick look and here are a few comments:

  1. Deviation from STaR:
    This is a great implementation of in-context test-time self-improving reasoner! However there are some deviations (or innovations!) from the original STaR method. For example, (i) this is an in-context method designed for test time, whereas STaR is a training-time method requiring reinforce fine-tuning; (ii) there is no rationalization process as in STaR (i.e., given problem x and true solution y, generate the reasoning trace z) since it is for test time; (iii) (minor) few-shot examples with reasoning traces are missing - we can probably add an entry somewhere to allow users to include them.
    I would suggest to consider renaming this feature; probably making it a more general self-improving reasoner?

  2. Other minor suggestions:

  • Solution generation: it looks like the current implementation generates one solution per iteration? If that is the case, we can extend it to generate multiple solutions in-parallel per problem per iteration, and we proceed with the highest ranked one(s) in the next iteration. Just like a tree, where the depth is the number of iterations, and the width is the search budget for solutions per iteration. To improve diversity in solution generation, maybe we can allow the use of multiple different llm APIs per generation.
  • Empirical results: it would be nice to add a readme file in the examples folder for this implementation to help users to quickly understand the background and usage, and show some experimental results with this method compared to single-round prompting.
  • Add fine-tuning support in future.

What do you think?

@Wendong-Fan
Copy link
Member Author

Wendong-Fan commented Jan 29, 2025

Thanks a lot for this PR! i just took a quick look and here are a few comments:

  1. Deviation from STaR:
    This is a great implementation of in-context test-time self-improving reasoner! However there are some deviations (or innovations!) from the original STaR method. For example, (i) this is an in-context method designed for test time, whereas STaR is a training-time method requiring reinforce fine-tuning; (ii) there is no rationalization process as in STaR (i.e., given problem x and true solution y, generate the reasoning trace z) since it is for test time; (iii) (minor) few-shot examples with reasoning traces are missing - we can probably add an entry somewhere to allow users to include them.
    I would suggest to consider renaming this feature; probably making it a more general self-improving reasoner?
  2. Other minor suggestions:
  • Solution generation: it looks like the current implementation generates one solution per iteration? If that is the case, we can extend it to generate multiple solutions in-parallel per problem per iteration, and we proceed with the highest ranked one(s) in the next iteration. Just like a tree, where the depth is the number of iterations, and the width is the search budget for solutions per iteration. To improve diversity in solution generation, maybe we can allow the use of multiple different llm APIs per generation.
  • Empirical results: it would be nice to add a readme file in the examples folder for this implementation to help users to quickly understand the background and usage, and show some experimental results with this method compared to single-round prompting.
  • Add fine-tuning support in future.

What do you think?

Hey @ZIYU-DEEP , thanks for the review and happy Chinese New Year!

(i) This is an in-context method designed for test time, whereas STaR is a training-time method requiring reinforcement fine-tuning.
I think our goal here is to incorporate only the data generation aspect of STaR into this module. Model training can still be achieved by utilizing the generated reasoning data through a fine-tuning pipeline, which can then be validated within our current pipeline. Including fine-tuning directly in this module would reduce modularity and increase complexity. WDYT?

(ii) There is no rationalization process as in STaR (i.e., given problem x and true solution y, generate the reasoning trace z).
Agree and updated:

    def generate(self, rationalization: bool = False) -> List[Dict[str, Any]]:
        r"""Execute the STaR pipeline on all problems.

        Process problems and return results. If output_path is specified,
        also save results to file.

        Args:
            rationalization (bool, optional): Whether to use rationalization.
                (default: :obj:`False`)

        Returns:
            List[Dict[str, Any]]: List of processed results
        """

(iii) Few-shot examples with reasoning traces are missing—we can probably add an entry somewhere to allow users to include them.
Agree and updated:

    def __init__(
        self,
        reason_agent: ChatAgent,
        evaluate_agent: ChatAgent,
        problems: List[Dict],
        max_iterations: int = 3,
        score_threshold: Union[float, Dict[str, float]] = 0.7,
        reward_model: Optional[BaseRewardModel] = None,
        output_path: Optional[str] = None,
        few_shot_examples: Optional[str] = None,
    ):

Generating multiple solutions in parallel per problem per iteration
I agree that generating multiple solutions at once could enhance diversity and improve the quality within one iteration. However, the improvement it brings over iterative refinement after evaluation may be limited. Additionally, it would increase token consumption and runtime, I think it may not so necessary for now, WDYT?

Adding a README file in the examples folder to help users quickly understand the background and usage
Agree and updated

Adding fine-tuning support in the future
Fine-tuning will be an independent module that integrates with the current pipeline

@Wendong-Fan
Copy link
Member Author

can we further add a method to validate problem format?

thanks @GitHoobar , added validation

@GitHoobar
Copy link
Collaborator

thanks for the addition

@Wendong-Fan Wendong-Fan merged commit 44bea0d into feat/star-datagen Jan 30, 2025
@Wendong-Fan Wendong-Fan deleted the star_enhance_wd branch January 30, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants