medium primitives

Implement Conv2d

Implement 2D cross-correlation (what PyTorch calls conv2d).

A 2D convolution slides a kernel over each input channel and sums the dot products to produce each output channel — the fundamental building block of convolutional neural networks.

Math:

$$y_{n,o,i,j} = \sum_{c,\,p,\,q} x_{\,n,\,c,\;i \cdot s + p - \text{pad},\;j \cdot s + q - \text{pad}} \cdot w_{o,\,c,\,p,\,q}$$

where $s$ is the stride, pad is the zero-padding amount, and the sum runs over all input channels $c$ and kernel positions $(p, q)$.

Note: this is cross-correlation, not true mathematical convolution (there is no kernel flip). PyTorch’s conv2d is also cross-correlation; the name is historical.

Output shape:

$$H' = \lfloor (H + 2 \cdot \text{padding} - k_H) / \text{stride} \rfloor + 1$$ $$W' = \lfloor (W + 2 \cdot \text{padding} - k_W) / \text{stride} \rfloor + 1$$

Inputs:

  • x: tensor of shape (N, C_in, H, W).
  • w: tensor of shape (C_out, C_in, kH, kW).
  • stride: int (default 1) — applied to both H and W.
  • padding: int (default 0) — symmetric zero-padding on all 4 sides.

Output: tensor of shape (N, C_out, H', W'). No bias.

Hints

cnn convolution

Sign in to attempt this problem and view the solution.