We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者你好,感谢你提供的bert meta learning的实现代码。
我在阅读reptile这部分代码的时候有一个问题:在reptile.py计算梯度的部分(第90行) gradient = meta_params - fast_params,为什么不是用更新后的模型参数减去更新前参数呢?我试着把他们的位置调换,发现对模型效果影响并不大,但是按照相反方向更新模型应该会训不动的。
另外,为什么outer learning rate 要设一个这么小的值呢(5e-5),本来inner learning rate已经很小了,如果一个小的outer learning rate 乘以平均后的梯度,那么更新的程度不会很小吗?
期待回复,谢谢!
The text was updated successfully, but these errors were encountered:
No branches or pull requests
作者你好,感谢你提供的bert meta learning的实现代码。
我在阅读reptile这部分代码的时候有一个问题:在reptile.py计算梯度的部分(第90行) gradient = meta_params - fast_params,为什么不是用更新后的模型参数减去更新前参数呢?我试着把他们的位置调换,发现对模型效果影响并不大,但是按照相反方向更新模型应该会训不动的。
另外,为什么outer learning rate 要设一个这么小的值呢(5e-5),本来inner learning rate已经很小了,如果一个小的outer learning rate 乘以平均后的梯度,那么更新的程度不会很小吗?
期待回复,谢谢!
The text was updated successfully, but these errors were encountered: