You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The preference dataset is used for reward model training, DPO training, and ORPO training. For system instructions and human inputs, the preference dataset provides a better answer and a worse answer.
so i think preference datagen is really important
Solution
core and cookbook
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Required prerequisites
Motivation
The preference dataset is used for reward model training, DPO training, and ORPO training. For system instructions and human inputs, the preference dataset provides a better answer and a worse answer.
so i think preference datagen is really important
Solution
core and cookbook
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: