Manually repairing a run #96

leonardmq · 2024-12-31T18:43:07Z

leonardmq
Dec 31, 2024

Can we repair runs (or their repairs) manually?

In the dashboard, it seems like we can only use a prompt to repair a run / sample. I have found the repair to be frustrating at times because the output often adds chatty text to the answer.

It often outputs something like:

Sure, I am happy to fix this for you. Here is your improved answer:
"""
[the repaired answer here]
"""

In one of my use cases, the model is asked to output data in CSV format. Sometimes it forgets a trailing separator or something like that - in such a case, it would be simpler for me to edit it than to explain it where the mistake is.

If I write a prompt to repair it, it usually repairs it successfully, but it adds chatty text around the actual CSV so the repaired answer would still need some changes.

Having manual repairing of a run (and / or manual repairing of a repair of a run) in the UI would allow fixing minor problems like that where the output is almost correct

Answered by scosman

Jan 2, 2025

Manually editing certainly works for now, as you discovered.

I'll see if I can adapt to auto-prompt to be more explicit about not including a pre-amble before the content.

For a more "traditional fix", add the needed guidance to the repair instruction until it gets it right. Something like "Only include the result in your response, do not reply with any pre-amble". Then start using the "multi-shot repair" prompt, which will include your previous guidance in the prompt. After a few examples, it should start getting it right with extra help.

I'm intentionally trying to capture what the LLM needs to get it right, so it can learn to improve quickly. It's a little faster at improving if it has…

View full answer

leonardmq · 2025-01-01T23:28:21Z

leonardmq
Jan 1, 2025
Author

I can actually just edit the JSON files, save, and reload the page. That is good enough for my particular problem - though doing it in the dashboard would be easier for non-technical people.

0 replies

scosman · 2025-01-02T15:51:10Z

scosman
Jan 2, 2025
Maintainer

Manually editing certainly works for now, as you discovered.

I'll see if I can adapt to auto-prompt to be more explicit about not including a pre-amble before the content.

For a more "traditional fix", add the needed guidance to the repair instruction until it gets it right. Something like "Only include the result in your response, do not reply with any pre-amble". Then start using the "multi-shot repair" prompt, which will include your previous guidance in the prompt. After a few examples, it should start getting it right with extra help.

I'm intentionally trying to capture what the LLM needs to get it right, so it can learn to improve quickly. It's a little faster at improving if it has LLM-comprehensible repair instructions, and not just in-out pairs. That said, both in the UI are a good idea.

1 reply

leonardmq Jan 2, 2025
Author

Thanks for the help! I later realized that the repair instructions can be incorporated into the prompt. However, from a new user’s perspective, the interaction between the repair instructions and the dataset is unclear—the Prompt page mentions it but remains somewhat opaque.

The repair problems I encounter seem partly due to my output format, which involves semi-complex CSV in plain text - if it were structured output instead, the model would probably not add preambles or backticks. In production, I use structured output with a JSON array of strings, where each string is a CSV row (sounds weird but works well for my use case). I initially wanted to use the same structured format for my Task in Kiln, but it seems like the array type is not yet supported in the Structured Output (JSON) form UI.

Would it be reasonable to replace the output_json_schema in the task.kiln file with a custom JSON schema that includes an array? Or would this be a misuse that risks breaking other features, either now or in future versions?

The dashboard UI seems to be handling a custom output_json_schema specified directly in task.kiln fairly well 😄 :

scosman · 2025-01-02T18:26:32Z

scosman
Jan 2, 2025
Maintainer

You're 100% on the right path.

You can use any valid JSON schema in the output_schema field, including arrays/objects/limits/etc. That will probably make everything run smoother. Just manually edit the task.kiln, or use the Python API.

The UI will always lag behind what JSON schema can do (neatest arrays, objects, limits), but for more technical users you can just set it manually.

1 reply

leonardmq Jan 2, 2025
Author

Nice, thank you - that should work for me 👍

scosman · 2025-01-03T04:41:13Z

scosman
Jan 3, 2025
Maintainer

FYI: added UI to let you do this for new tasks, which will be in the next release. Manually works great for now.

#97

2 replies

leonardmq Jan 3, 2025
Author

Tried out the PR - works quite nicely and the way the JSON Schema raw form is triggered is quite intuitive.

Using structured output instead of plain text also seems to solve the repair output preamble / not following original instructions closely enough. It is a lot more "focused" when doing the repair 👍

scosman Jan 3, 2025
Maintainer

Excellent!

Yeah, tried a few UIs before I realized I could just have the array/object/enum options in the dropdown 😆

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually repairing a run #96

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Manually repairing a run #96

leonardmq Dec 31, 2024

Replies: 4 comments · 4 replies

leonardmq Jan 1, 2025 Author

scosman Jan 2, 2025 Maintainer

leonardmq Jan 2, 2025 Author

scosman Jan 2, 2025 Maintainer

leonardmq Jan 2, 2025 Author

scosman Jan 3, 2025 Maintainer

leonardmq Jan 3, 2025 Author

scosman Jan 3, 2025 Maintainer

leonardmq
Dec 31, 2024

Replies: 4 comments 4 replies

leonardmq
Jan 1, 2025
Author

scosman
Jan 2, 2025
Maintainer

leonardmq Jan 2, 2025
Author

scosman
Jan 2, 2025
Maintainer

leonardmq Jan 2, 2025
Author

scosman
Jan 3, 2025
Maintainer

leonardmq Jan 3, 2025
Author

scosman Jan 3, 2025
Maintainer