Stochastic Depth

Implement Stochastic Depth from “Deep Networks with Stochastic Depth” (Huang et al., 2016).

During training, each residual block is randomly dropped with probability drop_prob. During inference, the block output is scaled by (1 - drop_prob) to match expected values.

Given:

x: shape (batch, d) — input (residual connection)
block_output: shape (batch, d) — output of the residual block
drop_prob: float — probability of dropping the block
training: bool — whether in training mode

During inference (training=False): $$\text{out} = x + (1 - \text{drop\_prob}) \cdot \text{block\_output}$$

Note: For deterministic testing, we only test inference mode (training=False).

Output: Tensor of shape (batch, d).

Hints