Implement label smoothing from “Rethinking the Inception Architecture” (Szegedy et al., 2016).
Label smoothing replaces hard one-hot targets with soft targets to prevent overconfidence:
$$y_i' = (1 - \epsilon) \cdot y_i + \frac{\epsilon}{K}$$
Where:
Given:
labels: shape (batch,) — integer class labels n_classes: integer K epsilon: float smoothing factor
Output: Tensor of shape (batch, n_classes) — smoothed label distribution.