Optax

Gradient transforms, optimizer chains, schedules, weight decay, EMA, masking. The production optimizer library for JAX.

0 / 25 solved Continue →

1. ○ Optax SGD Step
2. ○ Optax SGD with Momentum
3. ○ Optax Adam Step
4. ○ AdamW with Decoupled Weight Decay
5. ○ Optax RMSprop Step
6. ○ Constant Schedule
7. ○ Linear Schedule
8. ○ Warmup + Cosine Decay
9. ○ Piecewise Constant Schedule
10. ○ Exponential Decay Schedule
11. ○ Chain: Clip + SGD
12. ○ Adam + Weight Decay (Chain)
13. ○ Global-Norm Gradient Clipping
14. ○ Lookahead Optimizer Wrapper
15. ○ multi_transform per-Param Group
16. ○ Optax EMA on Params
17. ○ Gradient Accumulation via MultiSteps
18. ○ masked: Apply WD Only To Certain Params
19. ○ inject_hyperparams for Runtime LR
20. ○ zero_nans for NaN-Safe Training
21. ○ Full Training Step (Loss + Grad + Update)
22. ○ Train Step with Frozen Params (Mask)
23. ○ Train Step with Global-Norm Clipping
24. ○ Train Step with Warmup Schedule
25. ○ 4-Step Training Loop with Scan + Loss Curve