You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a Task with Structured Output, creating a fine-tuning job via the Fine Tune UI for an OpenAI model will compile the dataset in a tool_call format, where the structured output of the task is modeled as arguments to a task_response tool.
If the model is trained to output arguments to a task_response tool, usage at inference time should typically align with the same format to most benefit from the training. If the user were to call the model without a task_response tool for example, or with response_format instead, the model may behave inconsistently due to the mismatch with the training samples.
task_response as a tool does not seem to be documented at the moment.
To align with the sample format, the code calling the model at inference time might need to look like this:
constcompletion=awaitthis.openai.chat.completions.create({model: 'my-tuned-model',messages: [...],// response_schema: { ... }, // no response_schematool_choice: 'required',// the model must always call the tooltools: [{strict: true,type: 'function',function: {name: 'task_response',// define a function tool called task_response like in the samplesparameters: {type: 'object',properties: {x: {type: 'number'},y: {type: 'number'}},// our Task's structured output will be here
...
},},},],});// get the response from the function call args of the `task_response` toolcallconstoutput=JSON.parse(completion.choices[0].message.tool_calls[0].function.arguments)as{x: number;y: number}
Could you please clarify how code using the tuned model is expected to use the model at inference-time? I'd be happy to help with documenting this once intended usage is confirmed.
About the format in general, the OpenAI docs suggest that response_format might be more suitable when the structured output use case does not require making actual calls; it also shows an example of fine-tuning structured output by including the serialized JSON in content. However, their documentation does not elaborate on whether there are any meaningful differences besides a slightly different interface and slightly more convenient parsing with the SDK.
The text was updated successfully, but these errors were encountered:
I seem to notice negative side-effects at inference time on a GPT-4o-mini model tuned on the task_response format - both when using it with a task_response tool and parsing the arguments, and also when trying it out with response_format. For example, a tokenizer model tuned on that format often drops words and punctuation (happened sometimes, but that model does it a lot more) and sometimes it swaps out words (that is novel behavior).
In contrast, using response_format at inference time, and using the same model before fine-tuning seems to perform better, and after fine-tuning (on the same ~500 samples) using the content: "{ json in here }" seems to perform noticeably better.
Admittedly handwavy evaluation, but there seems to be a noticeable difference
For a Task with Structured Output, creating a fine-tuning job via the
Fine Tune
UI for an OpenAI model will compile the dataset in a tool_call format, where the structured output of the task is modeled asarguments
to atask_response
tool.If the model is trained to output
arguments
to atask_response
tool, usage at inference time should typically align with the same format to most benefit from the training. If the user were to call the model without atask_response
tool for example, or withresponse_format
instead, the model may behave inconsistently due to the mismatch with the training samples.task_response
as atool
does not seem to be documented at the moment.[Sample format sent to OpenAI for fine-tuning]
To align with the sample format, the code calling the model at inference time might need to look like this:
Could you please clarify how code using the tuned model is expected to use the model at inference-time? I'd be happy to help with documenting this once intended usage is confirmed.
About the format in general, the OpenAI docs suggest that
response_format
might be more suitable when the structured output use case does not require making actual calls; it also shows an example of fine-tuning structured output by including the serialized JSON incontent
. However, their documentation does not elaborate on whether there are any meaningful differences besides a slightly different interface and slightly more convenient parsing with the SDK.The text was updated successfully, but these errors were encountered: