Implement Stochastic Depth from “Deep Networks with Stochastic Depth” (Huang et al., 2016).
During training, each residual block is randomly dropped with probability drop_prob.
During inference, the block output is scaled by (1 - drop_prob) to match expected values.
Given:
x: shape (batch, d) — input (residual connection) block_output: shape (batch, d) — output of the residual block drop_prob: float — probability of dropping the block training: bool — whether in training mode During inference (training=False): $$\text{out} = x + (1 - \text{drop\_prob}) \cdot \text{block\_output}$$
Note: For deterministic testing, we only test inference mode (training=False).
Output: Tensor of shape (batch, d).