Implement a residual (skip connection) block.
A residual block computes: $$\text{output} = \text{ReLU}(x + F(x))$$
where $F(x)$ is a two-layer transform: $$F(x) = W_2 \cdot \text{ReLU}(W_1 \cdot x + b_1) + b_2$$
This is the core building block of ResNets. The skip connection adds the
input x directly to the output of the transform, helping gradients flow.
Input:
x: input of shape (batch, dim) W1: shape (dim, dim), b1: shape (dim,) W2: shape (dim, dim), b2: shape (dim,)
Output: Tensor of shape (batch, dim).