Train Image Classifier End-to-End

Train a tiny CNN end-to-end from scratch — no F.conv2d, no F.max_pool2d, no nn.*. Implement every layer by hand, including the full backward pass.

The pipeline (one epoch)

Forward:

conv   = conv2d(x, w_conv)              # (N, C_out, H-kH+1, W-kW+1)
relu   = max(conv, 0)
pool   = max_pool_2x2_stride2(relu)     # (N, C_out, H', W')
flat   = pool.reshape(N, -1)            # (N, C_out * H' * W')
logits = flat @ w_fc                    # (N, C)
probs  = softmax(logits, axis=-1)
loss   = -mean(log(probs[arange(N), y]))

where H' = (H - kH + 1) // 2 and W' = (W - kW + 1) // 2.

Backward (full chain rule):

dlogits = (probs - one_hot(y)) / N      # (N, C)
dw_fc   = flat.T @ dlogits              # gradient for FC weights
dflat   = dlogits @ w_fc.T
dpool   = dflat.reshape(N, C_out, H', W')

# Max-pool backward: route gradient to the argmax position in each 2x2 window
drelu   = unpool_2x2(dpool, relu)

# ReLU backward
dconv   = drelu * (conv > 0).float()

# Conv backward: accumulate over all output positions
dw_conv[c, 0, kh, kw] = sum_{n,i,j} x[n, 0, i+kh, j+kw] * dconv[n, c, i, j] / N

SGD update:

w_conv -= lr * dw_conv
w_fc   -= lr * dw_fc

Inputs

x: shape (N, 1, H, W) — grayscale images (1 input channel).
y: shape (N,) — integer class labels in [0, C), delivered as floats.
w_conv: shape (C_out, 1, kH, kW) — conv filters, no bias, stride=1, padding=0.
w_fc: shape (C_out * H' * W', C) — FC output layer.
lr: float — learning rate.
n_epochs: int — number of full passes through the pipeline.

Output

Returns the final w_fc of shape (C_out * H' * W', C). w_conv is not checked, simplifying the tests.

Tips

Implement conv with explicit loops over output spatial positions (i, j).
For max-pool, store the argmax positions during the forward pass so you can scatter gradients back in the backward pass.
Use a numerically stable softmax: subtract the row-wise max before exponentiating.
n_epochs=0 or lr=0 → w_fc is returned unchanged.

Train Image Classifier End-to-End

The pipeline (one epoch)

Inputs

Output

Tips

Hints