Grad through stop_gradient

Why this matters

lax.stop_gradient(x) blocks autodiff from flowing through x. From the perspective of the differentiation engine, the wrapped value is treated as a constant — even though the forward computation is unchanged.

This is a foundational pattern in JAX because:

Target networks (DQN, DDPG) — the target Q-value should not receive gradients; wrapping it with stop_gradient enforces this cleanly.
Straight-through estimators — let a discrete forward pass influence the backward via a continuous surrogate, but stop the gradient from flowing back through the discrete op.
Freezing sub-networks — treat a pretrained encoder as a feature extractor by stop_gradient-ing its output before computing the loss.

Worked mini-example

import jax
import jax.numpy as jnp
from jax import lax

x = jnp.array([1.0, 2.0])
w = 3.0

def loss(w):
    return jnp.sum((lax.stop_gradient(x) * w) ** 2)

g = jax.grad(loss)(w)
# grad w.r.t. w = 2 * w * sum(x²) = 2 * 3 * 5 = 30.0
# grad w.r.t. x would be 0 if we tried (stop_gradient blocks it)

Common pitfalls

Only blocks gradient — not the forward value — stop_gradient(x) still returns x in the forward pass; only autodiff is blocked.
Wrong target — wrapping the wrong argument (e.g., wrapping w instead of x) blocks all gradient and defeats the purpose.
Alternative: jax.lax.stop_gradient — accessible as either jax.lax.stop_gradient or from jax import lax; lax.stop_gradient.

Problem

Implement grad_through_stop(x, w) that:

Defines loss(w) = sum((stop_gradient(x) * w)²).
Returns jax.grad(loss)(w) — the gradient w.r.t. w only.

x: 1-D jax array.
w: scalar.

Returns: scalar — equals 2 * w * sum(x²).

Grad through stop_gradient

Why this matters

Worked mini-example

Common pitfalls

Problem

Hints