Feat/L2 Regularization #270

L-M-Sherlock · 2025-01-10T07:13:19Z

related PR:

Expt/regularization of parameters fsrs-optimizer#157

Expertium · 2025-01-14T18:30:15Z

I'm currently trying out a different regularization method (well, it wasn't really made for regularization, but it kind of acts like a regularizer and improves generalization). If it works well, we can ditch L2 which barely improves the metrics. So don't merge this just yet. And you'd have to un-merge related PRs, but oh well.

Expertium · 2025-01-16T20:53:56Z

The results ended up being underwhelming.
I implemented AutoClip for gradients. Relevant repo: https://github.com/pseeth/autoclip. The code is very simple, I implemented it in other.py in just 16 lines.
The key idea is simple: keep track of observed gradient norms, calculate the p-th percentile (clip_value = np.percentile(grad_history, clip_percentile)) of them, then if next time the gradient norm exceeds that value, use torch.nn.utils.clip_grad_norm_(self.model.parameters(), clip_value). This is kind of like outlier filtering, but with gradients.
From their paper:

The authors recommend using p=10 aka 10th percentile, which is extremely aggressive, but perhaps suitable for large neural nets.

I tried different values of clip_percentile on FSRS-5, and sadly it barely improves the metrics.
So let's use L2 regularization.

L-M-Sherlock · 2025-01-17T09:44:47Z

Wow, L2 regularization has a very significant impact on the distribution of w[12], w[14], w[15], w[16]

L-M-Sherlock added 2 commits January 10, 2025 14:50

Feat/L2 Regularization

3d91db4

add test

5875d66

L-M-Sherlock added the enhancement New feature or request label Jan 10, 2025

L-M-Sherlock added 4 commits January 10, 2025 16:09

replace config.batch_size with real_batch_size

4331143

add more tests

75562c8

update default gamma

ceeb5e4

bump version

a21d9ce

L-M-Sherlock added 3 commits January 17, 2025 10:16

bump version

3f4c438

Merge branch 'main' into Feat/L2-Regularization

165bf84

specify cargo-llvm-cov version in CI

3aaad10

L-M-Sherlock requested a review from asukaminato0721 January 18, 2025 04:30

asukaminato0721 approved these changes Jan 18, 2025

View reviewed changes

L-M-Sherlock merged commit f11c30d into main Jan 18, 2025
3 checks passed

L-M-Sherlock deleted the Feat/L2-Regularization branch January 18, 2025 05:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/L2 Regularization #270

Feat/L2 Regularization #270

L-M-Sherlock commented Jan 10, 2025

Expertium commented Jan 14, 2025 •

edited

Loading

Expertium commented Jan 16, 2025

L-M-Sherlock commented Jan 17, 2025 •

edited

Loading

Feat/L2 Regularization #270

Feat/L2 Regularization #270

Conversation

L-M-Sherlock commented Jan 10, 2025

Expertium commented Jan 14, 2025 • edited Loading

Expertium commented Jan 16, 2025

L-M-Sherlock commented Jan 17, 2025 • edited Loading

Expertium commented Jan 14, 2025 •

edited

Loading

L-M-Sherlock commented Jan 17, 2025 •

edited

Loading