Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise training script to update model #41

Merged
merged 50 commits into from
Feb 3, 2025

Conversation

rlskoeser
Copy link
Contributor

@rlskoeser rlskoeser commented Jan 14, 2025

work related to #39 #34 and closing out #15

  • add update option to htr2hpc-train script
    • set accuracy and file-size when uploading model
  • finish implementing train celery task / htr2hpc integration
  • delete model on training error (unless pre-existing / overwrite)

@rlskoeser rlskoeser changed the base branch from main to develop January 14, 2025 21:56
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch 2 times, most recently from ddb581b to 129f94b Compare January 15, 2025 17:20
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch from 129f94b to 103b36d Compare January 15, 2025 17:23
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch from 5817c6b to ea3e152 Compare January 16, 2025 17:36
@rlskoeser rlskoeser requested a review from cmroughan January 16, 2025 17:59
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch from 6640513 to 5593f96 Compare January 28, 2025 19:33
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch from 2fd9719 to bf131ab Compare January 28, 2025 20:23
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch from 3b95c4c to 5f6ea94 Compare January 29, 2025 18:36
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch from 6c6b79b to 67b105b Compare January 29, 2025 20:09
@rlskoeser rlskoeser force-pushed the feature/script-update-model branch from c2842b6 to cea5829 Compare January 29, 2025 20:52
@cmroughan
Copy link
Collaborator

cmroughan commented Feb 3, 2025

There's a bug causing the script to crash with a "'NoneType' object has no attribute 'file'" report. It's occurring after the run.py portion -- I think it might be happening in the portion of tasks.py dealing with refreshing model data from the database, though I have not had the time to fully isolate it.

Cases where it's occurring:

  • segmentation/transcription jobs training from scratch
  • segmentation/transcription jobs refining on a model with "overwrite" checked

It does not occur on seg/trans jobs refining on a model but without "overwrite" checked. It also does not occur if an earlier step of the script returns "No best model found".

Update:

Found the bug in the logs, it is indeed happening in tasks.py around where I thought:

File "/srv/www/escriptorium/app/env/lib/python3.11/site-packages/htr2hpc/tasks
.py", line 383, in train
    if model.file is None or model.file == model.parent.file:

@cmroughan
Copy link
Collaborator

I would request adjusting tasks.py and run.py to pass forward information about whether the "overwrite" checkbox is ticked or not. If it is ticked, it would be good to pass that info into upload_best_model() and get_best_model(). The logic f"Must be better than original model {original_model.name} accuracy {best_accuracy:0.3f}" etc should only run if "overwrite" is checked. If it is not checked, the new model can be uploaded into eScriptorium even if the accuracy is lower than the original model we're refining on (because it won't replace it). If it is checked, then the new model should not be uploaded (which is the current behavior).

This logic is because someone could select some arbitrary model to use as a foundation for refining on and they will still want the resulting model even if its accuracy is lower than the foundation model's. For example, my taking an existing model for Latin and using it to jumpstart training a Greek model -- I don't care if the output Greek model is 96% to the original Latin's 98% because it's a wholly new model for me. But if "overwrite" is checked, then in that use case I am presumably training a Latin model on top of the old Latin model and then I do want to make sure I'm not deleting a older but better model by overwriting on top of it.

@rlskoeser rlskoeser merged commit 4208473 into develop Feb 3, 2025
@rlskoeser rlskoeser deleted the feature/script-update-model branch February 3, 2025 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants