Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why only tune parameters in certain layers #5

Open
pengshancai opened this issue Mar 2, 2022 · 2 comments
Open

Why only tune parameters in certain layers #5

pengshancai opened this issue Mar 2, 2022 · 2 comments

Comments

@pengshancai
Copy link

Dear authors,
I have the following questions about the code:

  1. Why are certain parameters (e.g. params in the 6th and 7th layers) being fine-tuned, is it a concern on training speed?
  2. The code only only provide training using news titles, is there any concern on using abstracts or full passage? (e.g. training time)?
    Thanks!
@wuch15
Copy link
Owner

wuch15 commented May 12, 2022

Hi all, thanks for the comment. Sorry that one line of code is missing here. Please set config.num_hidden_layers = 8 to ensure that first 8 layers are used. This is because in practice we find fine-tuning the first 8 layers is slightly better than fine-tuning the entire model (though may not be always true for all kinds of models). Thus, the 7th and 8-th layers (numbers 6 and 7) are finetuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@wuch15 @pengshancai and others