Why only tune parameters in certain layers #5

pengshancai · 2022-03-02T03:51:59Z

Dear authors,
I have the following questions about the code:

Why are certain parameters (e.g. params in the 6th and 7th layers) being fine-tuned, is it a concern on training speed?
The code only only provide training using news titles, is there any concern on using abstracts or full passage? (e.g. training time)?
Thanks!

wuch15 · 2022-05-12T03:42:46Z

Hi all, thanks for the comment. Sorry that one line of code is missing here. Please set config.num_hidden_layers = 8 to ensure that first 8 layers are used. This is because in practice we find fine-tuning the first 8 layers is slightly better than fine-tuning the entire model (though may not be always true for all kinds of models). Thus, the 7th and 8-th layers (numbers 6 and 7) are finetuned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why only tune parameters in certain layers #5

Why only tune parameters in certain layers #5

pengshancai commented Mar 2, 2022

wuch15 commented May 12, 2022

Why only tune parameters in certain layers #5

Why only tune parameters in certain layers #5

Comments

pengshancai commented Mar 2, 2022

wuch15 commented May 12, 2022