-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot load pretrained weights for ResNet on master #206
Comments
I think the problem is
whereas the model implemented in Metalhead.jl as structure
Above comparison is made from the structures mentioned in the error message. |
@theabhirath @darsnack do we plan to tweak the model implementation to match torchvision? |
For me it's working fine
|
It works fine on latest release (v0.7.3) (commit b37bee7: b37bee7) However, I can confirm that it's broken on current master. So a commit between June 26 and Sept 2 did broke pre-trained Resnet. I did tried on commit 588d703, July 21 2022, but ResNet function function wasn't exported. @theabhirath Since you were working on these ResNet at this time, would you be able to take a look at this? |
IIRC, the reason it's broken is because we made some structural changes to the model and didn't port the weights again. I'll be happy to port the weights, but we ran into issues with loading weights on 1.6 if the weights were ported on later Julia versions due to minor edge cases like RNG changes. I'll take a look at what we can do to avoid that (of interest may be FluxML/Functors.jl#56, which should allow us to only save the trainable parameters and thus avoid these edge cases). |
That function is just a convenient wrapper, so we can already do this. The only remaining bit is to give the user-facing API a reasonable name so that people can actually remember it. |
It seems like ResNeXt is also broken with the same error on master. |
It's included under the general ResNet umbrella here, so all the above posts apply. In meantime, you could load the weights separately, patch up the mismatched portion and call |
Can you elaborate on what the "mismatched portion" is? |
The mismatch (@theabhirath can correct me) is that the pre-trained models contain dropout layers and the model we are loading into may not depending on the configuration. We added FluxML/Flux.jl#2041 specifically for this purpose, and I think that's what @ToucheSir is referencing. |
Actually, the current mismatch is that the order of the weights in the PyTorch state dict don't match the order of our model, so iterating through both of them in parallel and simply trying to load the weights won't work. I am trying to see if something can be done about this |
Why can't we port the existing weights that have been previously released instead of going back to PyTorch? Sure that will leave out new variants that we've added, but it will at least let us release without a regression. |
In the case of the iteration mismatch, we have access to the parameter keys in both dicts. Since those keys reflect the underlying structure, it should be possible to map one to the other semi-automatically? Then we can iterate one dict and access the other by the matching key. |
Package Version
0.8.0-DEV (master)
Julia Version
1.8.3
OS / Environment
Describe the bug
Running
errors with:
Steps to Reproduce
Install master and run the above code.
Expected Results
Expected it to load the weights into the model.
Observed Results
It threw the error.
Relevant log output
No response
The text was updated successfully, but these errors were encountered: