-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code style #227
Comments
This is exactly the opposite approach of what we set out to do with the Metalhead re-write in 0.6. Copying and pasting a bottleneck block over and over is good for pedagogy or examples like in the model zoo, but it isn't great software practice. If we saw that level of repetition in a codebase, someone would end up writing tooling or a framework to generalize that. There's a balance between HF transformers and generalizing to the point of confusion. Perhaps we've gone too far in one direction, but right now I think code organization/naming and documentation are what's keeping the current codebase from being clear to fresh eyes. |
We made an effort to separate the building blocks from the models. But maybe we can reorganize to have the building block files (mbconv.jl, etc.) live right next to the model file that uses them. So still following the model + sub-module organization that Carlo suggested above. |
I think most of the readability question can easily be addressed by good docs. I was short on time and secretly hoping that Pollen would be more mature by the time I could find a block to work on this, but it looks like that might take a while longer. I am busy for a little bit more but over the course of this summer I hope to polish the docs, make other improvements and get a proper v0.8 release in. |
It's not all on you. I will make time this week to do these cleanup steps so that additional contributors (like Carlo) can help, and we can release 0.8 soon (it is way overdue). |
now that I'm more familiar with the library I can see the logic of it |
Reopening with another stylistic comment. Models here rely a lot on As an example, below I access a specific layer inside two equivalent implementations of ViT, one from here and the other from torchvision: julia> pymodel.encoder.layers[0].ln_1
Python LayerNorm: LayerNorm((768,), eps=1e-06, elementwise_affine=True)
julia> jlmodel.layers[1][5][1][1].layers[1]
LayerNorm(768) # 1_536 parameters |
One thing we want to do but have not done yet is to use the named syntax for Chain(resblock0 = Parallel(..., residual = ..., bottleneck = ...), ...) Similar to Flax but without defining new types or monolith types that "do it all." Part of the reason that syntax got added to Flux is to address this style issue in Metalhead. |
I'm having a hard time understanding how a simple model like
ResNet(18)
is created.I think in general for a library like this it would be good to favor
at the cost of some code duplication and verbosity. This is basically the philosophy adopted by hugging face transformers.
So each model should have its own file and define its own submodules.
The text was updated successfully, but these errors were encountered: