Implement the Squeeze-and-Excitation (SE) block from “Squeeze-and-Excitation Networks” (Hu et al., 2018).
SE blocks adaptively recalibrate channel-wise feature responses by:
Squeeze: Global average pooling across spatial dimensions $z_c = \frac{1}{H \times W} \sum_{i,j} x_{c,i,j}$
Excitation: Two FC layers to produce channel attention weights $s = \sigma(W_2 \cdot \text{ReLU}(W_1 \cdot z))$
Scale: Multiply original features by attention weights $\tilde{x}_c = s_c \cdot x_c$
For simplicity, input is (channels, spatial) (flattened spatial dims).
Input:
x: shape (C, S) — C channels, S spatial positions W1: shape (C//r, C) — squeeze weights (reduction ratio r) W2: shape (C, C//r) — excitation weights
Output: Tensor of shape (C, S) — recalibrated features.