Exponential Decay Schedule

Why this matters

Exponential decay gives a smooth, continuous learning-rate reduction that was the standard before cosine annealing rose to dominance. It is still used in reinforcement learning (DQN, PPO) and whenever you want a simple, predictable decay without hand-picking boundary steps.

The formula

optax.exponential_decay(init, transition_steps, decay_rate) implements:

lr(step) = init * decay_rate ^ (step / transition_steps)

Every transition_steps steps, the LR is multiplied by decay_rate. For decay_rate < 1 the LR monotonically decreases.

Example with init=0.1, decay_rate=0.9, transition_steps=100:

step	LR
0	0.1 × 0.9^0 = 0.1
100	0.1 × 0.9^1 = 0.09
200	0.1 × 0.9^2 = 0.081
50	0.1 × 0.9^0.5 ≈ 0.0949

Common pitfalls

Not the same as polynomial decay — the exponent is step / transition_steps, not step / total_steps.
decay_rate must be in (0, 1) for decay; setting it ≥ 1 causes growth.
Cast step and transition_steps to int.

Inputs

step: scalar (cast to int).
init: initial learning rate.
decay_rate: multiplicative factor per transition_steps (0 < rate < 1).
transition_steps: steps per decay cycle (cast to int).

Output