medium primitives

Classifier-Free Guidance

Implement classifier-free guidance (CFG) — the linear combination used in diffusion models to blend unconditional and conditional noise predictions at inference time.

Formula

eps_guided = eps_uncond + guidance_scale * (eps_cond - eps_uncond)

Equivalently:

eps_guided = (1 - guidance_scale) * eps_uncond + guidance_scale * eps_cond

Interpretation

guidance_scale Result
0.0 Pure unconditional prediction
1.0 Pure conditional prediction
> 1.0 Extrapolates beyond the conditional — pushes harder toward the prompt

A typical Stable Diffusion default is 7.5. Higher values produce images that match the prompt more closely but can reduce diversity or introduce artifacts.

Why It Works

During training, the model learns both a conditional version (text prompt given) and an unconditional version (prompt dropped out, usually 10–20% of the time). At inference, CFG amplifies the direction from unconditioned to conditioned predictions, effectively increasing the classifier signal without a separate classifier model.

Reference

Ho & Salimans, “Classifier-Free Diffusion Guidance” (2021).

Inputs / Output

  • eps_uncond: tensor of shape (N, d) — model’s noise prediction without conditioning.
  • eps_cond: tensor of shape (N, d) — model’s noise prediction with conditioning.
  • guidance_scale: float scalar — how strongly to apply the conditional signal.

Output: eps_guided of shape (N, d).

Hints

generative diffusion guidance

Sign in to attempt this problem and view the solution.